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ABSTRACT 

A  Two-Dimensional  Fast  Recursive  Least  Squares  (2-D  FRLS)  algorithm  is  pre- 
sented using  a  geometrical  formulation  based  on  the  mathematical  concepts  of  vector 
space,  orthogonal  projection,  and  subspace  decomposition. 

By  appropriately  ordering  the  2-D  data,  the  algorithm  provides  an  exact  least- 
squares  solution  to  the  deterministic  Normal  equations.  The  method  is  further  extended 
to  the  general  FIR  Wiener  filter  and  to  ARM  A  modeling.  The  size  and  shape  of  the 
support  region  for  both  the  MA  and  AR  coefficients  of  the  filter  can  be  choosen 
arbitrarly.  The  ARMA  parameter  estimation  problem  is  also  considered  for  the  case 
when  the  system  input  is  not  available. 

Computer  simulations  are  presented  to  illustrate  the  applications  of  the  algoritm  for 
2-D  parameter  estimation,  system  identification  and  image  coding. 
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I.  INTRODUCTION 

Adaptive  algorithms  have  been  used  successfully  for  many  years  in  a  wide  range 
of  digital  signal  processing  applications  involving  non-stationary  data.  In  these  appli- 
cations it  is  desired  to  follow  closely  the  variations  of  the  parameters  characterizing 
the  process,  by  updating  (in  real  time  if  possible)  estimates  of  these  parameters  as 
soon  as  new  data  is  available.  Real  time  implementation  of  these  algorithms  only 
recently  became  possible  with  the  latest  capabilities  of  the  VLSI  technology  and  is 
partly  a  result  of  the  development  of  computationally  affordable  algorithms  based  on 
a  very  elegant  mathematical  formulation.  This  formulation  is  known  as  fast  recursive 
least  squares  (FRLS)  and  is  based  on  a  geometric  approach.  The  derivation  of  al- 
gorithms based  on  the  geometric  approach  uses  the  concepts  of  linear  vector  spaces, 
orthogonality,  projection  matrices,  and  their  relation  with  least  squares  prediction 
[Ref.  1.  2,  3].  Motivation  for  the  development  of  similar  algorithms  to  process  two- 
dimensional  (2-D)  data  is  a  consequence  of  the  very  interesting  results  reported  lately 
in  the  literature  on  adaptive  filters  for  one-dimensional  (1-D)  signals  in  what  concerns 
their  reduced  complexity  and  excellent  behavior  even  in  non-stationary  environments 
[Ref.  4,  5].  The  development  of  Fast  RLS  algorithms  for  2-D  data  based  on  the 
geometric  approach  is  what  is  addressed  in  this  thesis. 

A.      PROBLEM  FORMULATION 

A  major  problem  with  the  extension  of  Fast  RLS  algorithms  to  two  dimensions 
is  that  causality  is  not  inherent  in  2-D  systems.  This  problem  was  overcome  by 
associating  the  past  of  a  2-D  signal  with  the  region  of  support  of  a  recursive  filter  mask 
(usually  quarter  plane  or  non-symmetrical  half  plane).  By  appropriately  ordering  the 
2-D  data,  a  two-dimensional  Fast  Recursive  Least  Squares  (2-D  FRLS)  algorithm 


is  developed  using  a  geometrical  formulation  where  the  vector  spaces  and  all  the 
notions  associated  with  them  are  defined  to  reflect  the  2-D  nature  of  the  data.  An 
exact  least  squares  solution  to  the  deterministic  Normal  equations  is  provided  for 
all  of  the  all-pole  (AR),  all-zero  (MA),  or  pole-zero  (ARMA)  models.  The  size  and 
shape  of  the  support  region  for  both  the  MA  and  AR  coefficients  of  the  filter  can 
be  chosen  arbitrarily  as  long  as  the  overall  system  is  recursively  computable.  The 
ARMA  parameter  estimation  problem  is  also  addressed  for  the  case  when  the  system 
input  is  not  known. 

B.     THESIS  OVERVIEW 

The  remainder  of  this  thesis  is  organized  as  follows.  Chapter  2  provides  a  sum- 
mary of  the  most  common  adaptive  filtering  techniques.  Only  brief  reference  is  made 
the  Least  Mean  Square  (LMS)  algorithm  due  to  its  simplicity  and  slower  convergence 
properties.  However  a  2-D  version  of  this  technique  that  was  recently  reported  in  the 
literature  is  mentioned.  Most  of  the  chapter  reviews  the  basic  ideas  of  the  Recursive 
Least  Squares  (RLS)  algorithm.  This  provides  preparation  for  chapter  3  where  a  fast 
2-D  version  of  this  algorithm  is  developed  in  detail. 

Chapter  4  is  dedicated  to  applications  of  the  new  algorithm  to  the  2-D  problems 
of  systems  identification,  image  coding,  and  parameter  estimation.  Different  models 
(AR,  MA,  and  ARMA)  are  considered. 

Chapter  5  summarizes  the  results  obtained  and  suggests  some  possible  improve- 
ments for  the  new  algorithm.  Mathematical  derivations  related  to  the  geometrical 
formulation  that  are  essential  to  the  method,  but  too  tedious  to  be  inserted  in  the 
body  of  the  thesis,  are  grouped  in  the  Appendix. 


II.  ADAPTIVE  FILTERS 

A.     PERFORMANCE  CRITERIA 

In  adaptive  filtering  the  performance  criterion  is  usually  based  on  the  minimiza- 
tion of  a  cost  function  dependent  on  the  filter  coefficients  to  be  determined.  The  most 
common  performance  criterion  is  the  minimization  of  the  mean  square  error  (MSE) 
associated  with  the  signal  to  be  estimated  [Ref.  5,  6,  7].  In  particular,  if  we  consider 
a  random  process  y{n)  and  a  predictive  filter  of  the  form 

M 
y(n)  =  J2hn(k)y(n-k)  (1) 

Jt=i 

where  y{n)  is  the  predicted  value  and  hn(k)  are  the  filter  coefficients,  then  the  pre- 
diction error  is  defined  by 

e(n)  =  y{n)  -  y(n)  (2) 

and  the  MSE  becomes 

e  =  E[e\n)\  (3) 

where  E  is  the  expectation  operator.  For  stationary  data,  this  quantity  is  a  convex 
quadratic  function  of  the  filter  coefficients  hn(k)  and  attains  its  minimum  at  a  point 
where  the  partial  derivatives  with  respect  to  each  of  the  filter  coefficients  are  simul- 
taneously equal  to  zero.  Substituting  (1)  and  (2)  in  (3)  and  simplifying,  we  obtain 
the  desired  expression 

-^j^  =  -2E[e(n)y(n  -  k)}  =0      for  k=  1, ...,  M  (4) 

The  dependence  of  the  performance  criterion  on  the  filter  coefficients  can  be  in- 
terpreted in  terms  of  a  multidimensional  convex  surface  with  a  unique  minimum.  This 


surface  is  called  the  error-performance  surface.  The  coefficients  associated  with  the 
minimum  mean-square  error  are  obtained  by  solving  the  set  of  simultaneous  equations 
in  (4)  which  in  this  case  are  known  as  the  Normal  equations. 

B.       LEAST  MEAN  SQUARE  (LMS)  ALGORITHM 

1.      1-D  LMS 

The  Normal  equations  can  be  solved  by  brute  force  using  matrix  inversion 
or  by  computationally  faster  methods  such  as  the  Levinson-Durbin  algorithm  for 
Toeplitz  matrices.  However  here,  we  are  interested  in  a  method  called  the  steepest 
descent,  which  provides  an  iterative  solution  to  the  Normal  equations  [Ref.  5,  6,  7,  8]. 
We  start  with  a  initial  set  of  filter  coefficients  and  a  corresponding  point  on  the  error 
performance  surface.  We  then  compute  the  gradient  vector  formed  by  the  partial 
derivatives  of  the  mean-squared  error  with  respect  to  each  of  the  filter  coefficients  at 
that  point.  Using  (4)  the  gradient  vector  can  be  expressed  as 

V(n)  =  -2£[e(n)y»]  (5) 

where  y(n)  is  a  M  x  1  vector  that  contains  the  data  covered  by  the  filter  mask  at 
time  n 

y(n)  -  [y(n  -  l),y(n  -  2),  ...,y(n  -  M)]T  (6) 

Finally  we  update  the  coefficients  by  changing  them  in  a  direction  opposite  to  that 
of  the  gradient  vector  using  a  predefined  step  size  // 

hn+i  =  hn  +  i//[-V(n)]  =  hn  +  (iE[e(n)y(n)}  (7) 

where  hn  is  allxl  vector  that  contains  the  filter  coefficients  at  time  n 

hn  =  [hn(l),hn(2),...,hn(M)]T  (8) 

The  inconvenience  of  this  approach  is  that  it  requires  an  exact  measure  of  the  gradient 
vector  at  each  iteration  and  the  gradient  involves  statistical  expectation.  Usually  the 
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statistics  of  the  data  are  not  available  and  must  be  estimated  from  the  raw  data. 
Thus  to  obtain  a  really  accurate  estimate  of  the  gradient  is  quite  cumbersome  and 
computationally  expensive. 

A  practical  method  to  estimate  the  gradient  vector  in  a  simple  manner 
directly  from  acquired  data  is  used  in  the  Least  Mean  Squares  (LMS)  algorithm  of 
Widrow  [Ref.  5].  For  this,  the  expectation  in  (5)  is  ignored  and  an  instantaneous 
estimate  of  the  gradient  vector  is  taken  to  be 

V(n)  =  -2e(n)y(n)  (9) 

The  update  for  the  filter  coefficients  then  has  the  form 

hn+]  -  hTl  +  2^["V(n)]  =  hn  +  fie(n)y(n)  (10) 

This  method  is  quite  attractive  for  a  wide  range  of  applications  since  it  requires  no 
matrix  inversions,  correlation  function  estimation,  or  (actual)  gradient  computation, 
and  hence  has  low  computational  complexity.  However  its  convergence  is  relatively 
slow.  A  detailed  derivation  and  analysis  of  the  LMS  properties  can  be  found  in  [Ref. 

5.  6]. 

2.      2-D  LMS 

The  extension  of  the  LMS  algorithm  to  2-D  signals  is  straightforward. 
Reference  [Ref.  9]  gives  a  detailed  derivation  and  shows  that  the  analysis  presented  by 
other  authors  for  the  1-  D  LMS  is  also  applicable  to  the  2-D  version  of  the  algorithm. 

The  final  form  of  the  2-D  LMS  algorithm  is  very  similar  to  (10)  but  the 
vector  containing  the  1-D  filter  coefficients  hn  is  substituted  by  a  matrix  Wj  con- 
taining the  2-D  filter  coefficients.  The  instantaneous  estimate  of  the  2-D  gradient 
uses  the  data  matrix  X;  formed  by  the  2-D  input  samples  covered  by  the  2-D  filter 
mask  at  iteration  j.  For  a  N  x  M  2-D  sequence,  at  sample  y(n,  m)\  if  j  is  the  linear 


scanning  index 

j  =  mM  +  n  (11) 

then  the  algorithm  takes  the  form: 

Wi+,  =  W,  +  i/x[-V(n)]  =  W;  +  ^X;  (12) 

A  separate  derivation  of  this  algorithm  was  also  presented  in  [Ref.  10],  together  with 
some  examples  of  its  performance  through  computer  simulation  of  a  noise  canceler 
and  an  adaptive  line  enhancer  applied  to  an  image  processing  problem. 

C.      1-D  RECURSIVE  LEAST  SQUARES  (RLS) 

In  the  adaptive  methods  presented  above  the  need  to  solve  the  Normal  equa- 
tions appears  as  a  consequence  of  the  minimization  of  a  statistical  cost  function, 
the  mean-squared  error.  However  the  implementation  instead  uses  the  actual  data  to 
compute  errors  and  update  the  coefficients.  This  is  the  main  cause  of  the  performance 
deficiencies  encountered  when  implementing  this  algorithm. 

Another  possible  approach  is  to  base  the  performance  criterion  upon  error  mea- 
sures derived  from  the  actual  data.  This  class  of  techniques  is  known  generally  as 
Least  Squares  (LS)  algorithms. 

The  LS  algorithm  is  designed  to  find  the  set  of  filter  coefficients  that  minimize 
the  cumulative  sum  of  squared  errors. 

e(n)  =  £>2(i)  (13) 

t=i 

Although  this  seems  very  similar  to  the  previous  performance  criterion,  it  results  in  a 

set  of  deterministic  Normal  equations  whose  solution  provides  filter  coefficients  that 

are  exactly  optimal,  according  to  (13),  for  the  acquired  data  instead  of  statistically 

optimal  for  a  class  of  data  as  in  the  case  of  the  steepest  descent  methods  [Ref.  6]. 


To  formulate  this  problem  we  once  again  differentiate  the  cost  function  with 
respect  to  the  coefficients.  However  since  we  here  use  summations  instead  of  expec- 
tations the  result  is 


j^-j-  =  -2rn(0,k)  +  2y£hn(l)rn(k,l)  =  0      for  h  =  1,...,M 
where  we  define  the  deterministic  correlation  function  rn(k,l)  as 

n 

rn(k,l)  =^2y(n-  k)y{n  -I)      for  k,m  =  0,...,M 


(14) 


(15) 


This  set  of  M  simultaneous  equations  are  the  deterministic  Normal  equations.   The 
equations  are  written  in  matrix  form  as 


R(n)hn  =  E(n) 


where  R(n)  is  the  M  x  M  deterministic  correlation  matrix  with  the  structure 


(16) 


R(n)  = 


rn(l,l)      rn(l,2) 
rB(2,l)      rn(2,2) 


rn(l,M) 
rn(2,M) 


(17) 


rn(M,l)    rB(Af,2)    •••    rn(M,M) 

and  r(n)  is  the  M  x  1  vector  of  deterministic  cross-correlation  terms  between  the 
desired  filter  response  and  the  filter  inputs. 


r(77)  =  [rn(0,l).rn(0,2),---,rn((U/)]: 


If  R(n)  is  nonsingular  then  the  solution  to  the  Normal  Equations  is  formally 


(18) 


hn  =  R-»r(n) 


(19) 


This  brute  force  solution  requires  on  the  order  of  M3  arithmetic  operations.  A  better 
approach  is  to  use  a  method  known  as  the  recursive  least  squares  (RLS).  This  uses 
the  Matrix  Inversion  Lemma  [Ref.  5]  to  update  the  inverse  correlation  matrix  and  the 


cross  correlation  vector  as  new  data  is  acquired  and  thus  to  compute  h„  recursively. 
The  resulting  expression  for  R-1(n)  is 

R     (n)  =  R     (n  —  1) „  ~  ^  ~ : — — (20) 

By  defining  the  a  priori  error  e(n\n  —  1)  as 

e{n\n-l)  =  y{n)-yT(n)hn_1  (21) 

and  the  gain  vector  k(n)  as 


R-'(n-l)y(n) 
l+yT(n)R-1(n-l)y(n) 


«")=,    ..,r/-'P-i/ -    Yw-x  (22) 


we  can  rewrite  (20)  as 

R-](77)  =  R-1(77-l)-k(n)yT(n)R"1(r7  -1)  (23) 

If  we  then  use  c(n)  =  r(?7  —  1)  +  y(n)y_(n)  and  substituting  in  (19)  the  desired  update 
for  coefficient  vector  ru  is  found  to  be 


— n 


hn=hri_1+k(n)e(n|n-l)  (24) 

To  update  the  coefficient  vector  as  new  data  is  acquired  all  we  must  do  is  to  compute 
the  last  four  equations  assuming  that  all  the  parameters  with  index  n—  1  are  available 
at  time  77.  Since  the  non-singularity  of  the  deterministic  correlation  matrix  is  a 
requirement  for  the  solution  of  the  problem,  we  must  start  with  the  initial  condition 

R-1(0)  =  c-1IMxA/  (25) 

where  c  is  a  small  positive  constant.  It  also  customary  to  initialize  all  the  components 
of  the  coefficient  filter  hn  to  zero. 

The  RLS  algorithm  is  computationally  more  expensive  than  the  LMS.  The  RLS 
algorithm  requires  a  total  of  3M(3  +  M)/2  multiplications/divisions  per  iteration, 
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while  the  LMS  algorithm  requires  only  1M  -f  1.  On  the  other  hand  the  convergence 
performance  of  the  RLS  algorithm  is  much  superior  [Ref.  5,  6].  A  detailed  derivation 
of  this  algorithm  and  its  overall  performance  can  be  found  in  man}'  references  such 
as  [Ref.  5,  6,  8]. 

An  enormous  reduction  of  the  computational  complexity  of  the  RLS  method 
is  obtained  by  using  a  geometrical  formulation  in  its  derivation.  The  computational 
complexity  of  the  algorithm  is  reduced  to  approximately  6A/  arithmetic  operations. 
The  geometrical  approach  also  provides  an  interesting  interpretation  of  the  prediction 
problem  in  terms  of  the  concepts  of  vector  spaces  and  orthogonality.  Since  we  will 
use  an  expanded  version  the  geometrical  formulation  to  derive  the  extension  of  this 
method  to  2-D  signals,  and  the  derivation  is  lengthy,  we  will  not  derive  the  1-D  method 
here.  However  a  very  comprehensive  explanation  of  the  geometrical  approach  for  1-D 
signals  can  be  found  in  [Ref.  6]. 

For  the  case  of  nonstationary  data  it  is  frequently  advantageous  to  incorporate 
a  forgetting  factor  in  the  cost  function 

n 

c(n)  =  X>n~'e2(0      forO<A<l  (26) 

The  interpretation  of  this  forgetting  factor  can  be  understood  as  an  exponential  win- 
dowing of  the  data  in  a  fashion  such  that  the  most  recent  data  has  a  heavier  influence 
in  the  cost  function  to  be  minimized.  The  fast  algorithm  is  mathematically  equivalent 
to  the  RLS,  hence  its  stability  is  guaranteed  in  theory  for  any  possible  forgetting  factor 
A  [Ref.  7].  However  the  efficiency  of  this  class  of  algorithms  is  a  result  of  the  reduced 
number  of  variables  used  to  represent  quantities  such  as  the  inverse  deterministic 
correlation  matrix.  Due  to  the  finite  precision  arithmetic  used,  the  representation  is 
only  approximate.  As  a  result  the  accumulation  of  round-off  errors  can  set  off  insta- 
bility of  the  algorithm. The  sensitivity  of  some  quantities  to  round-off  error  are  highly 


dependent  on  the  forgetting  factor  used.  This  imposes  a  lower  bound  on  A.  Typical 
values  for  A  are  in  the  range: 

0.95  <  A  <  1.0  (27) 
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III.  2-D  FAST  RECURSIVE  LEAST  SQUARES 

In  this  chapter  we  present  the  derivation  of  the  2-D  FRLS  algorithm.  As  men- 
tioned before  we  use  the  geometrical  formulation  to  obtain  a  fast,  computationally 
efficient  algorithm. 

We  start  by  introducing  a  convenient  notation  that  closely  follows  the  one  used 
by  Alexander  [Ref.  6]  for  the  geometric  derivation  of  the  1-D  FRLS.  This  is  followed 
by  a  brief  set  of  vector  space  considerations  that  are  the  basis  of  the  problem  solution. 
Next  some  auxiliary  filters  that  use  the  same  data  as  the  2-D  filter  are  introduced. 
The  key  to  this  method  is  to  find  a  relation  between  the  parameters  of  these  filters 
that  permits  the  recursive  update  of  all  of  them  as  soon  as  new  data  is  available. 

A.      2-D  PREDICTION  FILTERS 

The  method  to  be  described  applies  to  a  general  2-D  prediction  filter  of  the  form 

y{nun2)  =  J3XIau2/("i  -  i,n2  -  j)  (28) 

*     j 

with  {i,j)  defined  in  a  region  that  allows  the  2-D  AR  model  related  to  the  prediction 

error  process  to  be  recursively  computable  (ex:  quadrant  or  non-symmetric  half  plane 

support).  The  recursive  computability  is  a  requirement  for  applications  where  inverse 

filtering  is  used  to  recover  the  2-D  data  sequence  from  the  estimated  error  sequence 

as  in  most  of  the  image  coding  schemes.  To  be  specific  and  develop  clear  notation  we 

will  assume  a  first  quadrant  (A'  +  1)  x  (M  +  1)  filter  of  the  form 

A'     M 

y(ni,n2)  =  J2T,aiJy(ni  -  *>2-i)       («\i)  f  (0,0)  (29) 

i=0 ;=0 

and  a  2-D  data  sequence  with  K  x  L  points  as  shown  in  Figure  1. 
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Figure  1.  First  Quadrant  (N+1)(M+1)  Filter 

The  prediction  error  e(n1?n2)  =  y{ni,n2)  —  y(ni,n2)  can  be  expressed  using 
vector  notation  as 

e(n\,n2)  =  y(nun2)  -yT(nun2)a  (30) 

where 


y{nun2)     =     [y(r?j  -  l,n2),--  •  ,y(n}  -  N,n2),y(nu  n2  -  1), 
••  ',y(ni  -  N,n2  -  l),---,y(ni  -N,n2-  2),- 

•••,y(n1,n2  -M),---,y(n1  -  Ar,n2  -  A/)]T 


(31) 


is  a  (N  +  1)(A/  +  1)  —  1  dimensional  vector  formed  by  the  elements  covered  by  the 
filter  mask,  ordered  along  rows,  and 


a  =  [a10,  •  •  • ,  a^o,  a0i,  •  •  •  ,  a/vi,-  •  •  ,  aoA/r  '  ■  ,0-NM] 


(32) 


is  the  vector  of  2-D  filter  coefficients  with  the  same  ordering.  We  assume  for  now  that 
y(k,l)  =  0  for  k  <  0  or  /  <  0. 
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When  performing  linear  prediction  along  one  of  the  possible  directions  of  recur- 
sion (e.g.,  along  rows),  and  all  of  the  data  necessary  for  each  prediction  is  available. 
the  optimal  filter  (least  squares  criterion)  up  to  point  (n1,r?2),  will  be  defined  as  the 
one  having  a  set  of  coefficients  atJ(n1,n2),  (  0  <  i  <  N  ,  0  <  j  <  M  ,  (i,j)  i=-  (0,0)  ) 
that  minimizes  the  sum  of  the  squared  errors  along  that  recursion  direction. 

*(ni,  n2)  =  !>(•,  "2)]2  +  £  nE[e(t,i)]2  (33) 

t=o  t=o  j=o 

B.     2-D  DATA  ORDERING 

One  question  that  arises  whenever  we  deal  with  finite  extent  sequences  is  what 
to  do  when  we  approach  the  boundaries  of  the  2-  D  data  sequence  and  the  prediction 
mask  needs  to  cover  points  that  lie  outside  of  the  region  where  the  data  is  defined. 
One  approach  is  to  set  the  points  outside  of  this  region  (i.e.,  the  boundary  conditions) 
to  zero.  The  inconvenience  of  this  approach  is  that  the  boundary  conditions  depend 
not  only  on  the  extent  of  the  2-D  sequence  but  also  on  the  shape  of  the  filter  mask 
and  this  can  lead  to  additional  complications  [Ref.  11].  In  addition,  when  we  reach 
the  end  of  a  row  and  we  start  a  new  one,  the  data  under  the  mask  is  almost  all  reset 
to  zero.  This  causes  a  strong  discontinuity  in  the  process. 

An  alternative  approach  is  to  assume  that,  although  the  2-D  data  we  process 
may  not  be  stationary,  the  statistical  properties  of  the  data  do  not  vary  too  rapidly, 
and  so  to  use  the  data  at  the  end  of  one  row  as  the  initial  condition  for  prediction 
along  the  next  row  as  is  shown  in  Figure  2.  This  appears  to  be  at  least  as  reasonable  as 
the  first  approach.  It  will  be  seen  later  that  this  approach  also  has  several  advantages 
in  deriving  an  algorithm  based  on  the  geometrical  approach.  From  a  practical  point 
of  view,  it  is  as  if  we  fold  the  2-D  data  plane  and  form  a  cylinder  with  perimeter 
equal  to  A',  but  the  data  rows  instead  of  folding  into  themselves,  are  misaligned  by 
one  row.  This  allows  the  prediction  to  be  performed  along  rows  with  the  2-D  mask 
moving  from  bottom  to  top  in  a  helical  fashion. 
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Figure  2.  Data  Ordering  v 

C.  PREVIOUS  DATA/  PAST  OBSERVATIONS 

Performing  linear  prediction  on  a  2-D  signal  along  rows  implies  that  the  new 
data  comes  only  from  a  strip  with  the  width  of  the  filter  mask  (M  -f  1).  This  suggests 
an  analogy  with  the  (M  +  1)  channel  1-D  prediction  problem. 

1.      (M-fl)  Channel  Analogy 

The  (M  +  1)  rows  of  the  data  strip  can  be  viewed  as  (M  +  1)  channels  of 
a  1-D  signal.  To  support  this  idea,  define  a  linear  index  n  such  that 


n  =  n2  x  K  -f  i\\ 


(34) 
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Then  the  2-D  data  sequence  t/("i,n2)  can  be  expressed  in  terms  of  the  data  in  the 
first  channel  as 

y{ni,n2)  =  yi(n)  (35) 

We  can  also  express  the  data  in  the  other  channels  in  terms  of  1/1(7?)  as 


Vi{n)  =  yi{n-{i-  1)A') 


for  i  =  1,...,M  +  1 


(36) 


We  will  predict  along  the  first  channel  using  data  from  all  channels  defined  by  the 
2-D  filter  mask.  A  consequence  of  this  approach  is  that  the  length  of  data  used  from 
each  channel  depends  on  the  shape  of  the  2-D  mask.  Figure  3  shows  the  particular 
case  of  a  Quarter  Plane  mask.  In  order  to  predict  y\(n)  =  i/(ni,n2),  the  data  used 
is  formed  by  N  samples  of  channel  1  (7/1)  and  (N  +  l)  samples  of  channels  2  (y2)  to 
M  +  1  (t/A/+i)-  This  requires  a  total  of  (N  +  1)  x  (M  +  1)  -  1  coefficients  as  in  the 
2-D  mask. 


Channel  1 

Channel  2 
Channel  3 


Channel  M+l 


Figure  3.  M+l  Channel  Analogy 
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2.      Past  data 

The  multichannel  analogy  will  be  used  to  define  what  we  call  the  past  data 
and  the  observation  data  because  the  notion  of  multichannel  prediction  will  be  useful 
later.  From  now  on  we  will  drop  the  2-D  notation  and  use  the  index  n  associated 
with  the  recursion  along  the  rows. 

Begin  by  defining  the  (n  +  l)-dimensional  vector 

y»  =  [y.(0),y,(l), ,y,(n)]T  1  <  i  <  M  +  1  (37) 

that  contains  data  in  channel  i  up  to  the  point  n.  Further  define  the  delay  operator 
z~k  such  that 

z-kyt(n)  =  [0.0,  •  •  •  ,yi(0),yi(l),  ■  ■  •  ,</,(n  -  k))T  (38) 

is  a  (n  -f  l)-dimensional  vector  that  contains  the  data  of  y.(n)  delayed  by  k  samples 
and  pre-windowed.  We  call  y(n)  the  observation  data  since  it  contains  the  data 
sequence  we  desire  to  predict.  The  past  data  with  respect  to  yi(n)  is  formed  by  all 
of  the  points  yx{j)  such  that  (2  <  i  <  M  +  1,0  <  j  <  n)  or  (i  =  1,0  <  j  <  n  -  1) 
as  shown  in  Figure  4.  Note  that  different  channels  have  data  in  common.  With  the 
new  notation  defined  we  are  ready  to  reformulate  the  problem. 

D.      PROBLEM  REFORMULATION 

We  start  by  redefining  (31),  the  vector  that  contains  the  data  covered  by  the 
2-D  filter  mask,  as 

YhN(n)  =  M"  -  1),'  •  *  ,2/i("  -  N),y2(n),---,yM+1{n),---,yAf+1(n  -  N))       (39) 

where  the  subscript  (1,N)  denotes  the  fact  that  the  data  in  the  first  channel  appears 
delayed  by  1  to  N  samples.  We  want  to  find  a  prediction  filter  of  the  form 

yi(n)  =  ylN(n)^(n)  (40) 
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y(n) 

^z^>^z^% 

ill 

ill 

Figure  4.  Past  Data 


with 


a(n)  =  [a10(n),  •  •  •  ,ayv0(")>  a01,  •  •  •  ,aNl(n),  •  •  •  ,a0W(n),  •  •  •  ,aNAf  (n)]T  (41) 


that  minimizes  the  sum  of  squared  errors 

«i(»)  =  I>i(0]2 

i=0 

where  the  prediction  error  given  by 

ei(")  =2/i(")-yi(n) 
This  can  be  written  in  vector  notation  as 


(42) 


(43) 


ci(n)  =  ef  (njfi^n) 


where 


(44) 


£i(»)    =    Y.i(n)-li(n) 


(45) 
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anc 


X1(n)  =  Y1,N(n)a(n)  (46) 

where  YlA'(n)  is  a  data  matrix  formed  by  the  data  covered  by  the  2-D  mask  from 
the  origin  up  to  time  n.  The  matrix  Y-[^(n)  can  be  written  as 


Yi.N(n)  = 


(47) 


or  using  a  different  partition,  Y1(/v(r?)  can  alternatively  be  written  in  terms  of  the 
data  in  the  M  4-  1  channels  and  their  delayed  versions  (37)  and  (38)  as 


YliN(n)  = 


1y1(")r--,^yvy1H,y2(n)^-1x2(n),---^-^M+i(n)J  (48) 

As  will  be  shown  later,  a  necessary  initial  condition  for  the  geometric  formulation 
to  work  is  that  we  start  at  a  sample  yi(0)  such  that  y  (0)  =  0.  That  is,  the 
initial  conditions  for  the  data  under  the  2-D  mask  must  be  zero.  Now  let  us  proceed 
with  the  minimization  of  (42).  The  least  squares  solution  for  a(n)  is  given  by  the 
pseudo-inverse: 

fl(n)     =     (Y[,A.(n)Y1,N(n))"1Y1rN(n)y](rO  (49) 

where  fY^A-(n)YiiAf(n)J  can  be  interpreted  as  the  inverse  of  a  2-D  deterministic 
correlation  matrix  and  YjN(n)y  (n)  can  be  interpreted  as  the  vector  of  the  deter- 
ministic cross-correlations  between  the  observation  data  and  the  past  data.  A  new 
expression  for  e^n)  is  obtained  substituting  the  solution  for  a(n)  in  (46)  and  using 
the  result  in  (45).  This  yields 


fiiW  =  y1{n)-'PiMn)y1(n) 

=     (I-P1,jV(n))yi(n) 
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(50) 


where 

Pi.a-(")    =    YliN(n)(Y^(n)YliN(n))"1Y1^(n)  (51) 


is  interpreted  as  the  projection  matrix  that  projects  vectors  into  the  subspace  spanned 
by  the  columns  of  Yltjv(n).  (Note  here  that  it  is  assumed  that  (Y^N(n)Y i^(n)j 
exists.)  Also  define 

PiN(n)    =    I-PlfJV(n)  (52) 

as  the  orthogonal  projection  matrix  associated  with  the  same  subspace. 

Both  the  L.S.  estimate  of  y,(")  and  the  prediction  error  can  now  be  expressed 
in  terms  of  the  projections  matrices.  The  estimate  y.(n)  is  the  projection  of  y  (n) 
onto  the  subspace  spanned  by  the  previous  data 

£»     =    PliN(n)Zl(n)  (53) 

The  error  e^??)  is  orthogonal  to  the  estimate  >\(n)- 

e,(n)     =    P£jv(n)2l(n)  (54) 

Next,  we  define  the  operator  Klt/v(7?)  as 

K1JV(n)     =     (YlN(n)YhN(n)YlYlN(n)  (55) 

hence 

a(n)    =    K1,N(n)y1(n)  (56) 

Ki,A'(r?)  can  be  interpreted  as  the  operator  that  computes  the  best  LS  filter  a(n)  for 
predicting  y.(n)  given  the  data  set  Yi^(n).  Now  since  y.(n)  can  be  obtained  from 
y_  (n  —  1)  as  soon  as  yi(n)  is  available,  if  we  find  an  efficient  way  to  get  Kli;\-(n)  from 
Ki,/v(n  —  1)  then  we  will  be  able  to  update  a(n  —  1)  to  a(n). 
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E.     VECTOR  SPACE  CONSIDERATIONS 

Before  going  any  further  we  must  present  some  concepts  and  relations  associated 
with  vector  spaces  that  will  be  needed  later  on.  To  make  the  results  generic,  in  this 
section  our  data  matrix  is  called  U(n).  This  generic  data  matrix  can  represent  the 
matrix  Yi,^(n)  defined  in  (47)  and  (48)  and  other  similar  matrices  that  are  defined 
later  in  this  chapter.  Then  following  the  definition  of  (51)  and  (52),  the  projection 
matrix  associated  with  \J{n)  is 

Pu(n)     =     U(n)(UT(n)U(n))_1UT(n)  (57) 

and  the  orthogonal  projection  matrix  is 

Pfe(n)    =    I-Pu(n)  (58) 

The  columns  of  V(n)  are  formed  by  the  vectors  that  span  the  vector  space  associated 
with  Ptj(70;  hence  when  we  compute  Pn(n)U(n),  we  have 

Pu(n)U(n)     =     U(7?)(uT(n)U(n))_1UT(n)U(n)  =  U(n)  (59) 

' «• ' 

I 

It  follows  from  this  that  the  projection  matrix  is  idempotent 

Pu(")Pu(n)     =     Pu(«)U(n)(UT(n)U(n))"1UT(r7)-Pu(77)  (60) 

U(n) 

The  following  relations  also  follow  from  the  definition  of  Prj(n)  and  Pjj(n). 

P{j(n)  =  Pu.W  (61) 

Pfe(n)Pfe(n)  =  Pfc(n)  (62) 

PuW  =  Pfj(n)  (63) 

P{j(n)Pu(n)  =  0(n+1)x(n+1)  (64) 

All  of  these  relations  are  easy  to  prove. 
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Now  let  us  append  a  matrix  V  (i.e.,  more  columns)  to  the  matrix  U(n).  The 
columns  of  V  do  not  need  to  be  orthogonal  to  the  subspace  spanned  by  U(n).  hence 
the  subspace  spanned  by  [U(n),V]  is  the  same  as  the  one  spanned  by  [U(n),W], 
with 

W    =     Pfc(n)V  (65) 

Then  we  define 

Puv(n)     =     Pu(")  +  Pw(")  (66) 

-     Pu(n)  +  Pfc(n)V  ([P{j(n)V]rPt(n)V)_1  VTPij(n) 

If  we  note  that  Pjjv(r?)  —  I  —  Puv(n);  then  it  follows  that 

Puv(")     -    Ptj(")-Pu(")V([Pij(n)V]TPtJ(n)V)"1VTPiJ(n)         (67) 


These  results  also  are  valid  when  V  is  a  vector  (i.e.,  a  matrix  with  a  single  column). 
Some  other  useful  relations  with  generalized  vectors  or  matrices  y  and  z  which 
follow  immediately  from  (66)  and  (67),  are 

Puv(n)y    =    Pu(")y  +  Pt(")V([PiJ(n)V]TPtJ(r7)V)"1VTPtJ(n)y     (68) 
Puv(")y     =     PTj(n)y-Pi](rOV([Pi(n)V]TPtJ(n)V)"1VrPiJ(n)y     (69) 

zTPuv(«)y   =   zTPu(")y  (70) 


-1 


+    2rPij(n)V([Pi(n)V]TPi(n)V)'   VTPi(n)y 

zTPuv(")y     =     zTPu(")y  (71) 

-     zTPb(nW  ([Pu(^)V]TP^(n)V)_1  VTPi(n)y 

These  relations  with  the  appropriate  choices  of  U(n),V,y,z  will  be  the  basis  of 
several  recursions  needed  later. 
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1.      Projection  Matrix  Time  Update 

Let  us  partition  the  data  matrix  U(n)  as 


U(n)     = 


U(n-l) 

T 


(72) 


where  uT  is  the  last  row  of  U(r?).  We  can  see  from  (47)  that  this  is  a  valid  repre- 
sentation when  U(n)  is  taken  to  be  Ylt^(n).  In  this  case  uT  corresponds  to  yT  N{n)- 
It  is  seen  that  U(n)  is  formed  by  the  columns  of  \J(n  —  1)  with  one  more  dimension 
appended,  the  components  of  uT.  Now  define  a  (n+  l)-dimensional  vector  7r(rc)  called 
the  unit  time  vector  [Ref.  6] 

7r(7?)  =  [0,0,0,---,0,0,0,l]T  (73) 

and  append  it  to  the  data  matrix  U(n)  to  form  [U(n),7r(n)].  It  will  now  be  shown  that 
the  subspace  spanned  by  this  new  matrix  contains  not  only  y.(ft)  but  also  y.(n  —  1). 
To  see  this,  proceed  as  follows.  We  know  that  if  y  (n)  lies  in  the  subspace 
spanned  by  U(n),  then  appending  7r(n)  will  not  change  a  thing.  Using  (68)  with 
V  =  7r(?7)  and  y  =  y  (n)  we  obtain 

PuJn^Cn)     -    Pu(n)y>)  (74) 

s  ^^  > 

yj(n) 

+     Pfe(n)i(n)  ([Ptj(n)7r(n)]rPtJ(n);r(n))"1  7rT(n)PTj(n)yi(n) 

v ^ ' 

o 

To  see  that  y,(n  —  1)  also  lies  in  this  subspace,  note  that  since  7r(n)  has  only  its 
last  component  non-zero,  a  linear  combination  of  the  vectors  in  [U(n),7r(n)]  can  be 
used  to  obtain  a  matrix  whose  columns  span  the  same  subspace  as  the  columns  of 
U(n-  1). 

It  can  be  shown  that  Pu,(n)  has  the  particular  form 


Pu2W  = 


Pu(n-l)    Q 
0T  1 


75) 
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(a  detailed  derivation  of  this  result  is  presented  in  the  Appendix).  Further  recall  that 
y  (n)  is  a  vector  formed  by  all  the  data  from  the  origin  up  to  point  n.  Thus  (75) 
provides  a  way  to  decompose  a  projection  matrix  into  past  and  present  components. 


PUjrHj^H       = 


Pu(n-l)    Q 
0T  0 


2/i(") 


(76) 


+ 


O     0 
0T     1 


yi(")     J      [     y\(n) 

2.      Angle  Parameter 

The  vector  7r(7?)  also  provides  a  way  to  quantify  the  change  of  subspaces 
when  we  update  \J(ii  —  1 )  to  U(r?).  First  note  that  the  inner  product  of  two  vectors 
gives  the  cosine  of  the  angle  between  them  multiplied  by  the  product  of  their  lengths, 
and  also  that  the  length  of  a  vector  resulting  from  the  projection  of  any  vector  into 
a  subspace  is  given  by  the  length  of  the  original  vector  multiplied  by  the  cosine  of 
the  angle  between  the  vector  and  the  subspace  (this  angle  has  always  magnitude 
<  |).  Now  observe  that  7r(n)  is  a  unit  vector  orthogonal  to  the  subspace  spanned  by 
U(n  —  1),  and  Pjj(7i)ji(v)  is  a  vector  orthogonal  to  the  subspace  spanned  by  U(n) 
with  length  equal  to  the  cosine  of  the  angle  between  7r(r?)  and  this  subspace.  Then 
defining  the  inner-product  as 


(a,  h)  =  aTb 


(") 


we  obtain 


7(n)  =  (l(n),Pfr(n)z(n)) 


=  1  x  cos  9  x  cos  6  =  cos2  6 


(78) 


where  6  is  the  angle  between  the  components  of  7r(n)  that  are  perpendicular  to  both 
subspaces  {U(7})}  and  {U(7?  — 1)}.  The  variable  f(n)  will  be  used  to  update  Ki,A'(n). 
In  order  to  begin  the  derivation  of  the  recursive  procedure  we  now  introduce 
three  auxiliarv  filters: 
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-  Forward  Multichannel  Prediction  Filter 

-  Backward  Multichannel  Prediction  Filter 

-  Gain  Transversal  Filter 
These  are  discussed  next. 

F.      AUXILIARY  FILTERS 

The  reason  for  the  auxiliary  filters  will  become  clear  later  when  it  is  shown  that 
the  update  of  a(n  —  1)  can  be  expressed  in  terms  of  the  auxiliary  filters  parameters. 

1.      (M+l)  Channel  Forward  Prediction  Filter 

We  begin  by  defining  a  (M  +  l)-channel  signal  Xf(n)  formed  by  the  data 
acquired  by  the  2-D  mask  when  it  is  moved  from  nton  +  1: 

Xf(")  =  [yi(n),y2(n  +  1),'  •  •  ,VM+i{n  +  1)]T  (79) 

We  want  to  find  the  best  LS  filter  that  predicts  Xf(n)  based  on  the  data  y  N(n) 
covered  by  the  2-D  mask  (see  Figure  5).  Let  the  coefficients  of  this  filter  be  defined 
by  a  {(N  +  \){M  +  1)  -  1)  x  (Af  +  1)  matrix  of  the  form 

F(n)  =  [f1(n),f2(n),...,fM+1(n)]  (80) 

where  each  of  the  ((N  +  1)(M  +  1)  —  l)-dimensional  vectors  f,(n)  for  (1  <  i  <  M  +  1) 
is  comprised  of  the  multichannel  prediction  coefficients  for  channel  i  with  the  same 
support  as  a(n).  The  prediction  of  Xf-(n)  is  given  by 

xF(n)  =  FT(n)yhN(n)  (81) 

and  the  prediction  error  is 

eF(n)  =  xF(n)-xF(n)  (82) 

If  we  define 

XF(n)    =     [y1(n),j:2(n  +  l),---,ZM+i(n  +  1)]  (83) 

=     [xF(0).xF(l),---,xF(n)]T 
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Figure  5.  Forward  Filter  Mask 

then  Xp(n)  is  a  (7?  -f  1)  x  (M  +  1)  matrix  that  contains  all  the  multichannel  data 
from  the  origin  up  to  current  value  of  the  index  n.  The  estimate  of  Xf(n)  is  thus 

XF(n)  =  YliN(n)F(n)  (84) 

and  the  prediction  error,  also  a  (n  -f  1)  X  (M  +  1)  matrix,  is 

EF(77.)  =  XF(n)-XF(n)  (85) 

The  error  covariance  for  the  multichannel  Forward  Filter  is: 

EF(n)  =  ETF(n)EF(n)  (86) 

Since  we  desire  F(?r)  to  minimize  the  error  energy 

ir{ZF(n)]  =  tr[ETF(n)EF(nj\  (87) 

the  optimal  LS  filter  is  again  obtained  using  the  pseudo-inverse  of  Ylt/v(?j) 

F(u)    =     (YlN(n)YhN(n))~lYlN(n)XF(n)  (88) 
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This  can  be  expressed  using  the  operator  defined  in  (55)  as 


F(n)  -  KliN(n)XF(n) 


(89) 


The  orthogonality  principles  mentioned  before  also  apply.  That  is,  the  estimate  Xf(n) 
is  the  projection  of  the  columns  of  XF(n)  onto  the  subspace  spanned  by  the  columns 
of  the  matrix  containing  the  previous  multichannel  data  (which  in  this  case  is  the 
same  as  the  2-D  data). 


XF(n)  =  PhN(n)XF(n) 


(90) 


The  error  EF(n)  is  orthogonal  to  the  estimate  EF(n),  i.e.,  the  columns  of  these 
matrices  span  subspaces  that  are  orthogonal  to  each  other. 

EF(n)  =  P^v(n)XF(n)  (91) 

If  ep{n)  is  defined  as  the  last  row  of  EF(??),  it  can  be  obtained  using  7[_{n)  as 

e?(n)  =  ET(n)EF(n)  =  7r(n)TP^(n)XF(n)  (92) 

2.      (M+l)  Channel  Backward  Prediction  Filter 

For  the  backward  prediction  problem  we  define  a  (M  +  l)-channel  signal 
Xfi(n)  formed  by  the  data  left  out  by  the  2-D  mask  when  it  is  moved  from  time  n  to 
n-f  1.  This  is  given  by  (see  Figure  6) 


xB(n)  =  [yi{7i  -  N),y2{n  -N),--  ■  ,yA/+i(«  -  K)l 


(93) 


Now  define  a  (n  +  1)  x  ((A'  +  l )(./!/ +  1)  —  1)  data  matrix  Y0,Ar_i(n)  that  has  a  structure 
similar  to  Yi^(n)  (47,48) 

r  y»V.(°) 


Y0,A'_i(n)  = 
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(94) 
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Figure  6.  Backward  Filter  Mask 


Y0,N-i(n)     =     [y^)r--r^N+'y,(n),zy2(n),y2(n),..^z-N^y^1(n)}    (95) 


We  note  that 


Xl,N(n)=^,N-l(n-1) 


(96) 


an< 


Y,,w(n)  =  «-,Y0,N-i(n) 


(97) 


The  problem  is  now  to  find  the  best  LS  filter  that  predicts  xB(n)  based  on  the  data 
matrix  Y0)n-i(ti).  Let  the  coefficients  of  this  filter  be  defined  by  the  ((TV  +  1)(M  + 
1)  -  1)  x  {M  +  1)  matrix 

B(n)  =  [b1(n),li2(n)r--,iiM+iW]  (98) 

where  each  of  the  ((Ar  +  1)(M  +  1)  -  l)-dimensional  vectors  b,(n)  for  (1  <  t  <  M+l) 
is  comprised  of  the  backward  prediction  coefficients  for  channel  i.    The  support  of 
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this  filter  is  the  same  as  the  support  of  a(r?)  but  shifted  one  sample  to  the  right  (see 
Figure  6). 

The  prediction  of  xB(n)  is  given  by 


and  the  prediction  error  is 


xB(n)  =  B(n)Ty0N_1(n) 


efi(n)  =  xB{n)  -  xB(n) 


(99) 


(100) 


We  continue  to  proceed  as  we  did  for  the  forward  multichannel  filter.  The  (n  -f  1)  x 
(M  4-  1)  matrix  that  contains  the  backward  multichannel  data  from  the  origin  up  to 
n  is  defined  as 


=     [xB(0),xs(l),...,xB(n)]T 


Then  the  estimate  of  Xfl(n)  is 


XB(n)  =  Y0iN.1(n)B(n) 

and  the  prediction  error  (also  a  (n  -f  1)  x  (M  +  1)  matrix)  is 


EB(n)  =  XB(n)-XB(n) 


(101) 


(102) 


(103) 


The  error  covariance  matrix  for  the  multichannel  backward  filter  is  then  given  by 
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SB(n)  =  E^(n)EB(n)  =  XB(n)J  P0Vi(n)XB(n) 
To  minimize  the  error  energy 

tr[£B(n)]  =  tr  \EB(n)TEB{n) 


(104) 


(105) 


our  optimal  LS  filter  is,  once  more,  obtained  using  a  pseudo-inverse,  but  this  time  for 
the  data  matrix  Y0,a"-i(")- 

B(n)    =     (Y0V1(")Yo,N-i(n))"1Y0Vi(")XB(n)  (106) 
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Finally  defining  a  new  operator  K0,A*-i(n)  in  terms  of  the  new  data  matrix  Yo,at_i(") 

Ko.A--i(n)  =  (Y0Vi(n)Yo,iV-i(n))"1Y0rJv.1(n)  (107) 

we  can  rewrite  (106)  as 

B(n)  =  K0,N-i(n)XB(n)  (108) 

The  orthogonality  principles  apply  once  more.  The  estimate  Xg(n)  is 
the  projection  of  the  columns  of  Xg(n)  onto  the  subspace  spanned  by  the  previous 
backward  multichannel  data 

Xfl(n)  =  P0,N-i(n)XB(n)  (109) 

where  Po,a-i(")  is  the  projection  matrix  associated  with  the  vector  space  spanned 
by  the  columns  of  the  new  data  set  Y0,at-i(")- 

Pojv-i(n)  =  Y0,N-i(n)  (Y07,v_1(n)Y0,.v_1(n))"1  Y0TA._»  (HO) 

The  error  Eg(r?)  is  orthogonal  to  the  estimate  X#(n),  i.e.,  the  columns  of  these  two 
matrices  span  subspaces  that  are  orthogonal  to  each  other. 

E5(n)  =  Po,A-i(")X*(")  (HI) 

Defining  e^(??)  to  be  the  last  row  of  Es(r?)  we  have 

4(n)  =  z(n)TEB(n)  =  7r(n)TP0ViHXB(n)  (112) 

3.      Gain  Transversal  Filter 

The  gain  transversal  filter  does  not  relate  to  specific  prediction  operations 
for  the  data  but  rather  provides  another  way  of  quantifying  the  angular  change  7(n) 
between  the  subspaces  associated  with  data  matrices  at  times  n  and  n  —  1.  To  begin, 
consider  the  projection  of  the  vector  7r(n)  onto  the  subspace  spanned  by  Y0,a-i(^)- 
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Since  this  projection  P0,A'-i(n)7r(n)  is  contained  in  the  subspace  of  Yo,;v_i(n),  we 
can  express  it  as  a  linear  combination  of  the  columns  of  Y0,A_i(n).  We  write  this  as 

Po.JV-i(n)i(n)  =  Y0^_i(n)K(n)  (113) 

where  g(n)  is  a  ((Ar  +  1  )(Af  +  l)  —  l)-dimensional  vector  of  weights.  Note  that  (113)  can 
be  interpreted  as  the  LS  prediction  of  7r(rc)  based  on  the  data  matrix  Y0,A'_i(n)  where 
g_(r?)  is  the  ((A'  +  1)(M  +  1)  —  l)-dimensional  vector  of  filter  prediction  coefficients. 
The  estimate  of  7r(n)  is  thus  given  by 

£(n)  =  Pojv-i  (n)tt(n)  =  Y0,/v-i(n)g»  (114) 

and  the  prediction  error  is 

e.(n)  =  jt(n)  -  P0jv-i(n)l(n)  =  P0Vi("k(")  (H5) 

The  last  component  of  er(??)  can  be  obtained  using  x(n)  and  turns  out  to  be  equal 
to  -y(r>)  as  in  (78) 

eK{n)  =  7r(77)TP^v_1(n)7r(77)  =  7(n)  -  cos2  0  (116) 

Then  substituting  the  middle  part  of  (115)  in  (116)  we  obtain 

7(n)  =  (l(n),a:(n)  -  P0jv-i  (n)i(n))  =  1  -  (7r(n),Poliy-i(n)7r(n))  (117) 

and  using  (113)  in  (117)  we  find 

7(n)  =  1  -  (z(n),Y0ftf-i(n)fi(n))  -  l-^ln^n)  =  cos2  0  (118) 

where  y_TN1(n)  is  the  last  row  of  the  data  matrix  Y0,,v-i(")-  If  we  now  recognize 
that  cos2  6  =  1—  sin2  9  we  see  that 

yoViHsH  =  sin2*  (119) 

If  we  use  the  LS  criteria  to  get  g_(n),  the  solution  is  again,  given  in  terms  of  a  pseudo- 
inverse,  or  more  conveniently  in  terms  of  the  operator  K0tN-i{n)  of  (107) 

g»  =  K0,Ar_i(n)2r(n)  (120) 

This  is  in  the  same  form  that  we  have  for  the  other  filters. 
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G.     FILTER  UPDATE  PROCEDURES 


In  this  section  we  determine  the  time  update  for  the  four  filters  defined  so  far. 
We  begin  by  showing  how  to  update  Klt/v(n)  and  K0,N-i(n)-  This  result  will  be  used 
for  updating  each  of  the  four  filters. 

1.      Transversal  Operator  Update 

Since  the  operations  to  be  described  now  are  common  to  all  four  filters, 
we  develop  the  formulas  in  this  subsection  in  terms  of  the  generic  matrices  U  and  V 
introduced  in  section  E  of  this  chapter.  Beginning  with  (66)  we  have 

Puv(n)  =  Pu(«)  +  PuHV  ([P6(n)V]TPij(n)V)"1  VTP£(n)  (121) 

where  U  is  a  (n  +  1)  x  ((TV  +  1)(M  +  1)  -  1)  matrix  and  V  is  a  (n  +  1)  x  (Af  +  1) 
matrix.  By  definition 

PuvM  =  [U.V]  ([U, V]T[U, V])"1  [U,V]T  (122) 

and 

Kuv(«)=  ([U,V]r[U,V])_1[U5V]T 

It  follows  that 


(123) 


Kuv(n)  =  Kuv(")Puv("! 


(124; 


hence  from  (123) 

Kuv(n)[U,V]    =    I[(jv+i)(Af+i)+M]x[(N+i)(M+i)+Af]  (125) 

In'xA"      On'xM' 

Om'xN'      Ia/'xM' 

with  N'  =  ((JV+1)(M  +  1)  -  1)  and  W  =  (M+l).  Now  we  can  partition  (125)  and 
write  it  as  two  equations,  namely 

I.V'xN' 


Kuv(n)U     = 


Oa/'xN' 


(126) 
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and 

The  first  equation  can  be  used  in  conjunction  with  Pjj  =  U(n)Ku(n)  to  obtain 


KUV(n)V     = 


(127) 


Kuv(n)Pu(n)     = 


Ku(n)  = 


Ku(n) 


C>A/'x(n  +  l) 


(128) 


I/V'xN' 
Oa/'xA" 

Now  substituting  (121)  in  (124)  and  using  (125)-(128)  to  simplify  the  resulting  ex- 
pression we  have 


Kuv(n)     = 


+ 


V 


OA/'x(n+i; 


(129) 


Ku(n) 

Oltfi  x(n+l] 

x     ([Pi(n)V]TPij(n)V)_1VTPt(n) 
By  forming  [V,  U]  and  following  similar  procedures,  it  is  straightforward  to  show  that 


O/V'xA/' 

Ia/'xM' 
-l 


Ku(n) 


Kvu("^ 


+  < 


Ia/'xA/' 
Oa'xA/' 

-1 


Oa/'X(ti+i: 
KuW 


(130) 


Oa/'x(ti+1) 

Ku(n) 
x     ([Pu(»)V]rPij(n)V)"1VrPij(n) 

For  the  particular  case  when  V  =  7r(n)  these  relations  are  also  valid  but  we  can  obtain 
them  from  the  derivation  given  in  the  Appendix.  The  result  is 

Ku(n-l)         0 
-uTKu(n-l)     1 
To  check  this,  note  that  for  V  =  7r(n)  (124)  can  be  written  as 

KU£(rc)  =  KUa(n)PuJn)  (132) 

Then  equating  these  two  ways  of  obtaining  Ktj,*^)  we  can  update  Ktj("  —  1)  to 
Ku(n) 


Ku.(n)    - 


(131) 


Ku(n-l)        0 
■urKu(n-l)     1 


KuW 
0 


Ku(n)T(n) 
-1 


(133) 


x     ([Pt(nk(n)]TPij(n)7L(n))"17rT(n)PiJ(n) 
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2.      2-D  Filter  Update 

We  now  develop  update  formulas  for  the  2-D  filter.  From  (56)  we  have 


a(n)  =  K^njy^n) 


(134) 


To  find  a(n)  in  terms  of  a(r?  —  1)  we  use  (134).  Post-multiplying  by  y,("),  and  taking 
V  =  7r(n)  and  U  =  Yji^v(n)  yields  after  simplification 


KliN(n-l)         0 
-^KliN(n-l)     1 


y>-i) 


= 

'  KltN(n)  ' 

0 

&( 

n)                         (135 

KhN{n)2L{n)  ' 

iWiAW 

-1 

2r7(n)PiLiN(7z)7r(n) 

At  this  point  we  note  from  (97)  that  since 


YijV(n)  =  rlYo|W-1(n 


with  y       (0)  =0  as  defined  initially,  then 


Y,,Ar(n)    = 


Y0,N-i(n-l) 

From  this  it  follows,  using  the  respective  definitions,  that 


KlliV(n)    -     [O.Ko,A--i(n-l)j 


and 


Pi.Ar(n)  = 


0  0T 

0    Po.N-i(n-l) 

These  results  in  conjunction  with  (118)  and  (120)  allow  us  to  write 


g(n  -  1)  =  K0jv-i(n  -  l)l(n  -  1)  =  KltN(n)z(n) 


(136) 


(137 


(138) 


(139) 


(140) 
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and 

-)("-!)    =    l-(l(n-l),Yo1jV-i(n-l)fi(n-l))  =  l-XJtJV_1(n-l)£(n-l) 

=     l-(7L(n),Y1,A-(r7)I(n-l))  =  l-y^(n)g(n-l)  (141) 

and  also 

7(n-l)=£T(n)P^v(n)7r(n)  (142) 

The  upper  part  of  (135)  then  becomes 

a(n  -  1)  =  a(n)  -  g>  -  1)    Cl(w)  (143) 

7(n-l) 

or 

fl(n)  =  a(n  -  1)  +  g(n  -  1)    ,Cl(w)n  (144) 

7(n-l) 

To  get  61(7?)  before  updating  a(n  —  1)  we  substitute  (144)  in 

ei(n)  =  yi(n)  -  y^(n)a(n)  (145) 

to  obtain 

d(n)  =  ei(n|n  -  1)  -  yfN(n)g(n  -  1)    ^(^  (146) 

where  we  define 

«^i("  I"  -  1)  =yi(")-y[,NHa(n-l)  (147) 

Now  (146)  can  be  simplified  using  (141) 

ei(n)  =  e,(n|n  -  1)  +  {i(n  -  1)  -  1)     Cl(w)     =  d(n|n  -  l)7(n  -  1)         (148) 

7(77  -  1) 

and  finally  (144)  can  be  written  as 

a(n)  =  a(n  -  1)  +  g>  -  l)ci(n|n  -  1)  (149) 

At  this  moment  we  have  all  we  need  to  update  a(n  —  1)  to  a(r?)  assuming  that  all 
variables  with  index  77  —  1  are  available.  However  we  want  to  find  g(n)  and  7(72)  in 
order  to  proceed  to  update  a(r?)  to  a(r?  +  1)  as  soon  as  Yitj^(n  +  1)  is  available.  This 
is  discussed  next. 
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3.      Gain  Filter  Update 

Begin  by  forming  a  new  data  matrix  Yo„/v(n)  that  can  be  expressed  in  terms 
of  Yi^'(n)  or  Y0iJv_i(n)  and  the  other  matrices  XF(n)  or  Xg(n)  after  a  multiplication 
by  suitable  permutation  matrices.  Each  row  and  column  of  the  permutation  matrix 
contains  a  single  1  with  all  the  other  entries  equal  to  zero.  The  positions  of  the  l's  are 
chosen  so  that  when  this  matrix  premultiplies  one  of  the  data  matrices  it  rearranges 
its  columns  to  conform  with  the  desired  order.  We  write  this  as 


Y0,A-(n)  =  [XF(n),YliN(n)]*F=[Yo,N-i(n),XB(n)]^B 


(150) 


Now  form  Ko„\(")  and  post-multiply  the  result  by  7r(n)  using  the  first  part  of  (150) 
and  (130)  with  V  =  XF{n)  and  U  =  Y1<N(n)  to  obtain 


Ko.N(n)l(n)     =     H 


OM'x(n+l) 

Ki,*(n) 


/ 


7r(n)  +  <S>i 

-l 


Im' xM' 

■K1,.v(n)XF(n' 


-F(n) 


\ 


Xf(n)P^(n)XF(n; 


V 


XTF{n)P±iN(n)K(n) 


/ 


Ef(n)  /  eF(n) 

T  _   ,T,-1 


Since  the  permutation  matrices  are  orthogonal  ($     =  $     )  we  have 


tfFK0jv(n)7r(n)     = 


OA/'x(n  +  l) 

Ki,N(n) 


7r(n)  + 


Ia/'xA/' 

-F(n) 


(151) 


v-i 


F(")eF(n)      (152: 


By  analogy  with  the  definition  of  gain  filter  (120)  we  can  define 


g'(n)  =  ^fK0,n(")tl(") 


(153) 


or 


,T     / 


tfFg'(n)  =  K0,Ar(n)7r(n) 


(154) 
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as  the  [(JV+  l)(Af  +  1)  -  1]  +  (.A/  +  l)-order  LS  predictor  of  7r(n)  using  the  permuted 
data  matrix  Yo,./v(n)ty]r.  Equation  (152)  can  be  rewritten  using  (140)  as 


g'K 


oAr 

+ 

.£("-!)  . 

EF  (n)eF{n) 


(155) 


Ia/'xA/' 

-F(n) 

An  alternative  way  to  find  K0,Ar(n)7r(n)  is  to  use  the  second  part  of  (150) 
in  (129)  with  V  =  XB(r?)  and  U  =  Y0);v_i(n)  to  obtain 


K0,N(n)iL(n)    =    $1 


+      ^l 


7r(n) 


K0tN-i(n) 

-K0,N-i(")XB(n 
Im'xM' 


(156) 


v    -i 


b     (n)es(n) 


Now  similarly  define 


£"(n)  =  W,A-(n)i(n)  = 


£(") 

+ 

Oa/' 

-B(n) 

Ia/'xM' 


E51(n)eB(n)  (157 


as  the  [(A'  -f  l)(M  -f  1)  —  1]  +  [M  +  l)-order  LS  predictor  of  n(n)  using  the  permuted 
data  matrix  Y1);v_1(77)vp£.  Since 


,T    ii 


Tit 


ns»  =  nil*) 


(158) 


it  follows  that 


g»  =  <Wg'(") 


(159) 


It  is  useful  to  partition  £"(77)  as 


g»  = 


M(n) 

m(n) 


(160) 


where  M(n)  is  a  (Ar  -f  1)(M  +  1)  —  1  vector  and  m(n)  is  a  (M  +  1)  vector.  The  lower 
partition  of  (157)  is  then 


m(n)  =  SB  (n)eB(n) 


(161) 
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This  can  be  substituted  in  the  upper  partition  of  (157)  to  give 


g(n)  =  M(n)  +B(n)m(n) 


(162) 


Thus  it  is  seen  that  if  g'(n)  is  available,  we  can  obtain  g(n),  provided  that  we  have 
F(n),  B(n),  and  all  the  associated  parameters  already  available. 

The  inversions  of  Ef-(n)  ar>d  Sg(n)  can  also  be  carried  out  recursively  using 
the  matrix  inversion  lemma  (details  are  given  in  subsection  7  of  this  chapter). 

4.      Forward  Filter  Update 

To  update  the  forward  filter,  proceed  as  follows.  From  (89)  we  have 


F(n)  =  KlfAr(n)XF(n) 


(163) 


To  find  F(n)  in  terms  of  F(??  —  1)  we  use  (134)  post-multiplied  by  XF(n)  with  V  = 
7r(n)  and  U(n)  =  Y^a^??)  to  obtain 


K1,.v(77   -   1 


0 


-y[A-Ki.A-("-i)  i .       xf(") 


XF(n-l) 


K,,N(n; 

0 


Xf(«) 


K1JV(n)i(n) 

-1 


7rT(n)P^(n)XF(n; 

IT(«)PWn)l(") 


Then  using  only  the  upper  partition  we  find 


F(n-l)  =  F(n)-g(n-l) 


e?(n) 
7(n-l) 


(164) 


(165) 


or 


F(n)  =  F(n-l)+g(n-l)    *f("} 

7(n  -  1) 

To  compute  eF(n)  before  having  F(n  —  1)  we  note  from  (81)  and  (82)  that 


(166) 


eF(n)  =  xF(r?)-FT(n)y      (n) 


(167) 
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and  substitute  (166)  to  obtain 


ej(„)  =  ej(n|n  -  1)  -  yf  w(n)£(n  -  1)-=M 


7(77    -   1) 


(168) 


where  we  define 


eF{n\n  -  1)  =  xF(n)  -  FT(n  -  l)yiN(n) 


(169) 


Now  (168)  can  be  simplified  to 


ef(n) 


eF(n)  =  eF(n|n  -  1)  +  (l(n  -  1)  -  1)  -/Mr  =  eF(n|n  -  l)7(r?  -  1)        (170) 


7(n-l) 


Hence  from  (166)  we  have 


F(n)  =  F(77  -  1)  +  g(n  -  l)eTF(n\n  -  1) 


(171) 


This  is  the  desired  update  formula  for  F(n). 

To  compute  EF(r?)  we  use  (71)  with  U  =  Ylt^(n)  ,  V  =  7r(/7 )  and  z 
y  =  Xf(t?)  and,  also  (75)  to  obtain 

eF(n)e£(n) 


JdI 


XF(n)'Pf)Ni>)XF(n)  =  EF(n)- 


7(n-l) 


Then  using  (170)  we  obtain 


XF(») 


P^v(n-l)     0 


0T  0 

Finally,  simplifying  (173)  we  find 


;F(n-l)=    SF(n)-eF(n)ef(n|n  -  1) 


(172) 


XF(n)  =  EF(n)  -  eF(n)ef(n|»i  -  1)  (173) 


(174) 


or 


)F(n)  =  EF(n  -  1)  +  eF(n)eF(n|n  -  1) 


(175) 


Thus  all  the  parameters  for  the  forward  multichannel  prediction  problem  can  be 
updated  from  the  values  obtained  at  the  end  of  the  (n  —  1)  recursion. 
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5.      Backward  Filter  Update 

The  update  procedure  for  the  backward  filter  is  similar  to  that  for  the 
forward  filter.  From  (108)  we  have 


B(n)  =  K0,N-i(n)XB(n) 


(176) 


To  find  B(r?)  in  terms  of  B(??  —  1)  we  use  (134)  with  V  =  7r(n)  and  U  =  Y0,.\'-i(") 
and  postmultiply  by  X#(n)  to  obtain 


K0,N-i(n-l)  Q 

-yoVi^--^"-1)  i 

K0,A-i(n 
0 


Xfi()i 


Xb(ti-I) 
xj(n) 

K0,N-i(rc)7r(n) 
-1 


(177) 


H1  (n)PjN_,(n)XB(n) 
2LT(n)P^_1(nMn) 


Then  using  only  the  upper  partition  of  this  equation  we  obtain 

eS(n) 


B(n-l)  =  B(n)-g(n)- 


7(n) 


(178) 


or 


B(n)  =  B(n  -  l)  +  g(n) 


gliM 
7(n) 


(179) 


To  compute  e^n)  before  having  B(n  —  1)  we  substitute  (179)  in 


eB(n)  =  xB(n)  -  BT(n)yflJV_1(n) 


(180) 


to  find 


eTB(v)  =  eTB(n\n-\)-£N_1(n)g(n) 


g£(n) 
7(n) 


(181) 


where  we  define 


eB(n|77  -  1)  =  xfl(n)  -  BT(n  -  lJj^Cn) 


(182) 
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Now  (181)  can  be  simplified  to 


<%}("'. 


eJB(n)  =  eB(n\n  -  1)  +  (7(n)  -  1)  -^f  =  ej(n|n  -  lb(n 

7(n) 


(183; 


Therefore 

B(n)  =  B(r7  -  1)  +  g(n)e£(n|n  -  1)  (184) 

We  note  here  that  the  update  of  g(n)  depends  on  B(n)  and  vice- versa,  but 
we  will  address  that  problem  shortly. 

To  compute  £73(77)  we  use  (71)  with  U  =  Yo,/v-i(")  ,  V  =  7r(n)  and 
z  =  y  =  Xg(n),  and  also  (75)  to  obtain 

eB(n)el{n) 


XbMPoVi>)XbM  =  £b(") 


7(n) 


(185) 


Then  using  (183] 


Xj(77) 


PoVi("-D    o 


0' 


o 


XB(n)  =  ES(77)  -  eB(77)eg(77|77  -  1)  (186) 


and  simplifying  we  find 


SB(7?  -  1)  =  SB(n)  -  eB(n)eJ(r?|n  -  1 


(187 


or 


EB(77)  =  EB(77  -  1)  +  eB(n)eTB(n\n  -  1)  (188) 

We  are  now  ready  to  solve  the  problem  of  mutual  dependence  between  g(n) 
and  B(n).  For  this  refer  to  (161)  and  (162).  Substituting  (184)  in  (162)  we  obtain 


g(n)  =  M(n)  +  B(n  -  l)m(n)  +  g(n)eg(n|n  -  l)m(n) 
Now  note  that  since 

ej(77|?7  -  l)m(n)  =  eB{n\n  -  l)Eg1(r7)eB(n) 
is  a  scalar,  (1S9)  becomes 

g(77)  =  [M(t7)  +  B(77  -  l)m(n)]  (1  -  ej(n|n  -  l)m(n))"1 
and  the  problem  is  solved. 


(189) 


(190) 


(191) 
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6.      Angle  Parameter  Update 

The  final  relation  need  to  complete  the  recursion  is  an  update  formula  for 
7(7?).  This  update  is  performed  in  a  manner  similar  to  the  update  of  g(n);  that  is  we 
compute  the  angle  parameter  related  to  the  data  matrix  Y0v/v(n)  using  two  different 
approaches  and  then  equate  them.  By  the  same  procedure  that  lead  to  (118)  we 
define 


l'(n)  =  l-^N(n)K0,N(n)K(n) 

Then  using  (154)  for  K0,A-(n)x(")  and  partitioning  y^  N(n)  into 

y_o,.v(")=  [x?(rc),y[iA,(rO]*F 

we  obtain 


(192) 


(193; 


VM  =  1-  [xr(«),z;iA.(n)jWFg'(n) 


(194) 


and  using  (152)  and  simplifying  we  find 


y'(n)=7(n-l)-eF(n)T,Fl(n)eF(n) 


If  we  now  use  (157)  for  K0,A-(n)7r(n)  and  partition  y^  ,vr(")  as 


4-0,N 


y0,.Y('?)  =  [y<Lv-i(n)'xfi(n). 


VI' 


B 


we  obtain 


(195) 


196) 


V(n)  =  l- 


,T  „», 


This  can  be  simplified  to 


l'(n)  =  -,(n)-eTB(n)ZB1(n)eB(n) 


but  using  (183)  and  (161)  we  obtain 


7(r?)  =  7f(n)[l  -  el{n\n  -  l)m(n)] 


-1 


(197) 


(198) 


(199) 
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where  everything  in  the  right  side  of  (199)  is  available. 

Since  E^n)  does  not  appear  explicitly  in  any  of  the  updates  we  need  only 
to  update  EF  (n).  This  can  be  easily  done  using  the  matrix  inversion  lemma.  The 
specific  procedure  is  outlined  below. 

7.     Inverse  Matrix  Update 

Although  all  the  quantities  have  now  been  derived  that  permit  the  recursion 
to  continue,  we  note  that  the  inverse  matrix  Ejp1,  and  not  the  matrix  itself  occurs  in 
the  recursions.  This  is  a  fairly  small  matrix  and  could  be  inverted  directly.  However  it 
is  more  efficient  to  also  compute  the  inverses  recursively.  This  is  easily  accomplished 
using  the  matrix  inversion  lemma  [Ref.  5,  7].  We  have  from  (175) 

£F(n)  -  EF(n  -  1)  +  eF{n)eTF(n\n  -  1)  (200) 

or  using  (183)  we  have 


EF(??)  =  EF(n  -  1)  +  — : — 

7(n  -  1) 


For 


(201 


The  matrix  inversion  lemma  states  that  if 

A  =  E  +  FG~1FT  (202) 

then 

A"1  =  E"1  -  E_1F[G  +  FTE-1F]-1FTE~1  (203) 


A     -    £F(n)  (204) 

E    =    £F(n-l)  (205) 

F    =    eF{n)  (206) 

G    =    7(n-l)  (207) 
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we  obtain 

£^(n)  =  E?(n  -  1)  -  7  ~ n  .'T/' 7-i7~'  W    7  (20S 


£^(n-l)eF(n)e?(n)S^(n-i; 
7(n-l)+e|:(n)E?(n-l)eF(n) 


The  computation  of  EF  (n)  using  the  matrix  lemma  amounts  to  1.5(il/  +  l)2  +  2.5(A/  + 
1)  multiplications  and  one  division. 

Although  the  matrix  Eg!(n)  could  be  computed  in  a  similar  way,  the  in- 
verse of  Efl(n)  is  not  needed  for  the  2-D  filter. 

H.     ALGORITHM  SUMMARY 

1.      Computational  Complexity 

The  computational  cost  of  the  algorithm  depends  on  the  shape  and  size  of 
the  filter  mask.  For  each  mask  we  must  define,  as  mentioned  before,  both  a  forward 
and  a  backward  multichannel  signal  a  number  of  channels  equal  to  the  number  of  new 
points  acquired  or  dropped  off  by  the  mask.  Call  this  number  I\\.  For  the  quarter 
plane  filter  we  used  I\\  —  M  +  1.  Now  further  define  A'2  as  the  number  of  coefficients 
in  the  2-D  filter  mask  (for  the  case  used  in  the  derivation  A'2  =  {M  +  1)(AT  -f  1)  —  1). 
The  computational  cost  of  the  algorithm  depends  only  on  these  two  numbers.  Note 
that  the  use  of  the  permutation  matrices  only  changes  the  ordering  of  the  elements  in 
the  matrices  affected  by  them,  allowing  the  procedure  to  be  used  for  any  shape  and 
size  of  filter.  The  permutation  can  be  obtained  without  any  multiplications,  hence 
it  will  not  be  considered  in  this  analysis.  We  also  mentioned  before  the  use  of  a 
forgetting  factor  A  in  the  cost  function  to  handle  nonstationary  signals.  The  effect  of 
this  constant  in  the  final  algorithm  shows  up  only  in  the  computation  of  the  inverse 
error  covariance  matrix  for  the  forward  multichannel  filter: 

LF  (n)  =  A        £F(n-l)-— — - — — —  (209) 

V  A7(n-l)  +  eF(n)EF1(n-l)eF(n)y 

This  increases  the  computational  cost  by  Kl  multiplications.  A  detailed  count  of  the 
number  of  operations  required  by  the  algorithm  is  given  in  subsection  3  below.  The 
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method  used  for  the  1-D  RLS  can  be  applied  to  2-D  signals  by  forming  the  A'2  x  A'2 
2-D  deterministic  correlation  matrix  and  the  A'2  x  1  vector  of  deterministic  cross- 
correlation  terms  between  the  desired  filter  response  and  the  filter  inputs.  This  form 
of  2-D  RLS  algorithm  requires  1.5A'|-f  4.5A'2  operations  per  iteration  which  increases 
quadratically  with  A'2. 

2.  Initial  Conditions 

The  recursive  implementation  of  the  algorithm  requires  some  initialization 
for  the  variables  used.  The  assumption  that  the  signal  is  zero  before  iteration  n  =  0 
is  reasonable  and  suggests  that  all  the  filter  parameters  including  those  of  the  gain 
filter,  should  all  be  set  to  zero.  This  choice  of  initial  conditions  implies  that  the 
angle  parameter  -)(0)  must  be  set  to  1.0  since  all  of  the  subspaces  associated  with 
previous  data  are  the  null  space.  However,  a  positive  forward  prediction  error  energy 
is  necessary  for  the  algorithm  to  start.  The  initial  conditions  used  are  therefore 

(210 
(211 
(212 
(213 
(214 

Im'xM'  (215 

6    =     small  positive  constant  (216 

3.  Iteration  at  time  n  and  Required  Arithmetic  Operations 

The  terms  to  be  computed  at  each  iteration,  and  their  formulas  are  given 


a(0) 

=    Q 

F(0) 

=    O 

B(0) 

=    O 

g(0) 

=    Q 

7(0) 

=    1 

v-i 

=  b 

below. 


-  A  priori  2-D  prediction  error  (A'2  operations) 
C!(n|n  -1)  =yi(n)-y*N{n)a{n-l) 


(217) 
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2-D  filter  update  (A'2  operations) 


a(n)  =  a(n  —  1)  +  g(n  —  l)e!(n|n  —  1) 


(218) 


A  priori  multichannel  forward  prediction  error  (A'iA'2  operations) 


eF(r7|n  -  1)  =  xF(n)  -  FT(n  -  l)y1N{n) 


Multichannel  forward  prediction  error  (K\  operations) 


eF{n)  =  eF(n\n  -  1)7(12  -  1) 


(219) 


(220) 


-  Inverse  error  covariance  matrix  for  the  multichannel  forward  filter  (1 .5A'j2  + 
2.5A'j  operations) 

V(n-l)eF(n)eJ(n)S^(n-l) 


F  F  7(n-l)  +  eJ(n)E^(n-l)eF(n) 

-  Multichannel  forward  filter  update  {K\K2  operations) 

Y(n)  =  F(n  -  l)+g(n  -  l)eF{n\n  -  1) 

-  Extended  gain  transversal  filter  {K\  +  K\K2  operations) 


(221) 


(222) 


g"(") 


M(n)  ' 

=  tffltfF| 

oAF 

Ia/'xA/' 

\ 

+ 

SreF(n) 

m(7?) 

V 

.  fi(n-l). 

.  -F(") . 

) 

(223) 


-  Extended  angle  parameter  (A'i  operations  using  previous  results) 


y,(n)=<r(n-l)-eIF(n)XF1(n)eF(n) 


(224) 


-  A  priori  multichannel  backward  prediction  error  (A^A'2  operations) 


es(77|77  -  1)  =  xTB(n)  -  BT(n  -  1)£ „_,&] 


-  Angle  parameter  (A'i  +  1  operations) 


(225) 


7(n)  =  Y(n)[l  -  e5(n|n  -  l)m(n)] 


-1 


(226) 
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-  Gain  transversal  filter  (A'iA'2  +  A'2  operations  using  previous  results) 
g(n)  =  [M{n)  +  B(n  -  l)m(n)]  (1  -  e|(n|n  -  l)m(n))-1  (227) 

-  Multichannel  backward  filter  update  (A'iA'2  operations) 

B(n)  =  B(n  -  1)  +  fi(n)e5(n|n  -  1)  (228) 

The  total  number  of  operations  (multiplications  or  divisions)  required  per  iteration 
by  the  algorithm  is 

2.5/Tj  +  6A\  A'2  +  4.5A'j  +  3A'2  +  1  (229) 
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IV.  EXTENSIONS,  APPLICATIONS  AND  RESULTS 

The  2-D  FRLS  algorithm  was  developed  in  the  previous  chapter  for  a  2-D  pre- 
diction filter  whose  observed  signal  was  the  same  as  the  sequence  under  the  filter 
mask.  In  this  particular  case  the  coefficients  of  fi(rc)  for  the  2-D  filter,  are  identical  to 
the  coefficients  of  the  first  column  fi(n)  of  the  multichannel  filter  F(n).  To  see  this, 
note  that  by  definition,  the  error  energy  for  the  multichannel  forward  prediction  filter 
(87)  can  be  rewritten  as  the  summation  of  the  error  covariance  associated  with  each 
of  the  (M-fT)  channels  of  the  forward  prediction  filter,  where  the  first  term  is  ei(n), 
i.e.,  the  sum  of  squared  errors  for  the  first  channel  (44).  We  can  rewrite  (87)  as 

tr[XF(n)]  =  *r[Ej(n)EF(n)]  =  Cl(n)  +  TF(n)  (230) 

where  Tf  (n)  is  independent  of  the  coefficients  in  fi(n).  The  2-D  filter  coefficients  and 
the  coefficients  in  the  first  column  of  the  forward  multichannel  filter  are  the  result  of 
minimizing  the  same  cost  function  and  are  thus  identical. 

We  will  now  consider  the  case  when  the  data  sequence  under  the  prediction 
mask  is  distinct  from  the  observed  sequence  (general  FIR  Wiener  filter)  and  also  the 
case  when  the  filter  mask  covers  not  only  observation  data,  but  also  other  input  data 
sequence  (ARM A  model).  Following  that  we  will  present  the  results  of  computer 
simulations  to  illustrate  the  applications  of  the  adaptive  algorithms  for  2-D  signal 
processing. 

A.     GENERAL  FIR  WIENER  FILTER 

The  extension  of  the  2-D  FRLS  to  the  general  FIR  Wiener  filtering  problem  is 
straightforward  to  obtain  by  following  the  same  concepts  presented  in  chapter  3.  To 
be  specific  we  again  consider  a  first  quadrant  (N  -f  1)  x  (M  -f  1)  filter  here.  However, 
the  procedure  applies  more  generally  to  nonsymmetric  half  plane  and  other  filters  as 
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discussed  in  section  H  of  chapter  3.  The  multichannel  notation  developed  in  chapter 
3  is  used  here  to  define  two  K  x  L  2-D  sequences  di(n)  and  yi(n),  where  di(n)  is  the 
sequence  we  want  to  estimate  based  upon  the  input  sequence  yi{n).  Our  goal  is  to 
find  a  prediction  filter  of  the  form 

di(n)=^N(n)a(n)  (231) 

with  y        defined  as  in  (39)  and  a(n)  defined  as  in  (41)  that  minimizes  the  sum  of 

squared  errors 

ei(n)  =  X>(0]2  (232) 

»=o 

where  the  prediction  error  ei(n)  is  given  by 

c1(n)  =  <f1(n)-i(n)  (233) 

This  can  be  written  in  vector  notation  as 

ei(n)  =  ef  {n)&1(n)  (234) 

where 

fi!(n)     =    di(n)-di(n)  (235) 

with  d^n)  defined  as  (??  +  l)-dimensional  vector  that  contains  the  observation  data 
from  the  origin  up  to  point  n.  Then  the  estimate  di(ft)  is  given  by 

d1(n)  =  YltN(n)a(n)  (236) 

where  Yi^(n)  is  the  same  data  matrix  as  in  (47)  and  (48).  The  least  squares  solution 
for  a(??)  is  once  more  given  by  the  pseudo-inverse 

a(n)     =     (Y[,N(n)Y1,N(n))"1Y^(n)d1(n)  (237) 

and  the  estimate  of  d,(??)  and  the  prediction  error  can  also  be  expressed  in  terms  of 
the  projection  matrices  defined  in  chapter  3.  The  estimate  dj(n)  is  the  projection  of 
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dj(n)  onto  the  subspace  spanned  by  the  input  data 

di(n)    =    Pijv(n)di(n)  (238) 

The  error  e^n)  is  orthogonal  to  the  estimate  d^n)  and  is  given  by 

*(»»)    =    Pi>(«)di(»)  (239) 

The  operator  KltAr(n)  defined  in  (55)  can  be  used  to  rewrite  a(n)  as 

a(n)     =    KllN(n)jr1(n)  (240) 

We  already  know  how  to  obtain  K^^(n)  from  Ki^'(r?  —  1)  in  a  efficient  way, 
thus  we  are  able  update  a(/?  —  1 )  to  a(n)  as  soon  as  di(n)  is  available.  The  complete 
algorithm  is  the  same  as  the  one  summarized  in  section  H  of  chapter  3  with  yi(n) 
replaced  by  </](??)  in  (217). 

B.     ARMA  MODEL 

The  ARMA  version  of  the  2-D  FRLS  can  be  viewed  as  follows.  Let  us  call 
the  output  or  observed  data  y\(n)  and  the  input  data  Wi(n).  For  the  present  let 
us  assume  that  this  latter  sequence  is  also  known  or  observed.  Let  us  separate  the 
coefficients  that  operate  on  the  two  different  sequences  and  call  a(n)  the  vector  of 
AR  coefficients  of  the  filter,  and  £>(??)  the  vector  of  MA  coefficients  of  the  filter.  As 
before,  we  develop  this  extension  of  the  2-D  FRLS  to  ARMA  models  assuming  a  first 
quadrant  (TV  -f  1 )  X  (M  +  1 )  quarter  plane  mask  for  both  the  AR  and  MA  components 
of  the  filter,  noting  that  more  general  forms  are  possible.  Using  the  scanning  index  n 
defined  before,  we  proceed  by  defining  an  ARMA  prediction  filter  of  the  form 

Mn)=ziN(n)&(n)+X-iAn)Hn)  (241) 

with  y   N(n)  and  wlA.(?i)  defined  using  the  same  concepts  as  in  (39).  We  want  to  find 
a(n)  and  b(n),  the  filter  coefficients  defined  respectively  for  each  mask,  to  minimize 
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the  sum  of  squared  errors 

eiH  =  X>i(0]2  (242) 

«=o 

where  the  prediction  error  is  given  by 

ei(n)  =  yi(n)-y1{n)  (243) 

This  can  be  written  in  vector  notation  as 

ei(n)=fij,(n)£l(n)  (244) 

with 

fii(n)    =    XjW-fcW  (245) 

We  can  combine  the  AR  and  MA  coefficients  in  one  single  vector  c(n)  as 

c(n)    =     [aT(n),bT(n)]r  (246) 

The  data  under  the  mask  can  then  be  expressed  as 

Zi.*(n)     -     [y^("),w[,A.(n)]T  (247) 

Now  we  have  for  the  estimate  of  y,(") 

y1(n)  =  ZliN(n)c(n)  (248) 

where  Zi^(ii)  is  the  data  matrix 

Z1„v(n)  =  [Y1,N(n),W1,N(n)]  (249) 

formed  by  Yi,at(")  ar>d  W^jvfn),  the  data  matrices  associated  respectively  with  the 
AR  and  the  MA  masks  with  the  same  structure  as  (47)  and  (48).  The  least  squares 
solution  for  c(n)  is  given  by  the  pseudo-inverse  of  Zli/v(") 

c(n)    =     (z[,N(n)Z1,N(n))"1Zf)N(n)y1(n)  (250) 

After  defining  new  projection  matrices  and  transversal  filter  operators  associated  with 
the  new  data  matrices,  the  algorithm  to  recursively  update  c(n)  closely  follows  the 
procedures  developed  in  chapter  3. 
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C.      ARMA  MODELING  WITH  UNKNOWN  INPUT 

In  the  previous  section  on  2-D  ARMA  modeling,  it  was  assumed  that  the  se- 
quence Wi(n)  was  known  and  available.  This  is  an  ideal  situation  which  could  some- 
times exist  for  example  in  a  system  identification  problem  (see  Figure  7).  Given  both 


Figure  7.  Modeling  with  Known  Input 

the  input  sequence  iv\(n),  and  the  output  sequence  y\{n)  and  an  assumed  linearity 
and  order  for  the  model,  we  have  all  the  information  necessary  to  identify  the  unknown 
system  under  analysis.  Knowledge  of  the  input  sequence  is  not  always  available,  how- 
ever this  is  the  case,  for  example,  in  the  problem  of  estimating  the  parameters  of  an 
ARMA  model  where  u'i(n)  is  a  2-D  white  noise  sequence.  The  ARMA  parameter 
estimation  problem  for  unknown  input  was  addressed  using  recursive  algorithms  for 
1-D  signals  by  embedding  the  AR  and  MA  estimation  in  a  2-Channel  AR  modeling 
problem  [Ref.  2],  The  difficulties  with  the  procedures  suggested  are  similar  here;  the 
2-D  white  noise  process  u'i(n)  is  not  known  and  needs  to  be  estimated  from  the  data. 
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Assume  that  at  some  moment  n  all  the  data  under  the  MA  mask  is  known 
except  the  most  recent  sample  of  the  noise  sequence  toi(n).  Further  assume  that  the 
ARMA  filter  coefficients  estimated  so  far  are  fairly  close  to  the  actual  coefficients 
that  characterize  the  system.  The  natural  choice  for  an  estimate  for  the  unknown 
noise  sample  is  the  error  ei(n),  i.e.,  we  expect  the  error  to  be  zero  if  the  noise  sample 
was  known.  This  procedure  is  known  as  "bootstrapping"  [Ref.  2]  (see  Figure  8).  It 
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Figure  8.  Modeling  with  Unknown  Input 

involves  two  steps.  First  an  estimate  t/i(n)  is  obtained  assuming  tt>i(n)  =  0  and  using 
the  old  parameter  estimates.  Secondly  w-[(n)  is  set  equal  to  ej(n)  and  we  proceed  as 
in  the  case  of  a  known  input  sequence.  This  method  is  highly  nonlinear  and  hence 
very  difficult  to  analyze.  However  it  was  found  to  give  reasonable  results  in  practice. 

D.      SIMULATION  RESULTS 

The  2-D  FRLS  algorithm  was  tested  both  on  computer-generated  data  and  on 
digitized  images.  For  a  baseline  reference  the  2-D  LMS  algorithm  was  also  imple- 
mented. The  synthetic  data  for  the  system  identification  and  parameter  estimation 
results  was  obtained  by  driving  different  2-D  transfer  functions  with  computer  gen- 
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erated  white  Gaussian  noise.  Most  of  the  tests  were  performed  on  32  x  32  point 
2-D  data  sequences.  Image  coding  was  performed  for  256  x  256  pixel  images  using 
the  VICOM  system  for  display  purposes  and  a  VAX- 785  computer  for  the  algorithm 
implementation.  The  algorithms  were  coded  in  Fortran. 

1.      2-D  System  Identification 

The  first  computer  simulation  in  system  identification  was  performed  using 
a  (2x2)  MA  model.  A  computer-generated  white  Gaussian  noise  sequence  was  applied 
both  to  a  filter  with  known  coefficients  and  to  the  adaptive  filter  in  the  manner  of 
Figure  7.  The  error  between  the  output  of  the  two  filters  was  used  to  adjust  the 
coefficients  of  the  adaptive  filter.  The  MA  filter  had  the  form 

d(nun2)     =     6(0.0)y(771,n2)  +  6(0,l)y(n1,n2-l)  (251) 

+    6(l10)y(n,  -  l,n2)  +  6(l,l)y(n,  -  l,n2  -  1) 

with 

6(0,0)  =  1.0  (252) 

6(0,1)  =  0.6  (253) 

6(1,0)  =  -0.3  (254) 

6(1.1)  =  0.3  (255) 

The  rate  of  convergence  is  shown  in  Figure  9.  As  can  be  seen,  each  of  the  coefficients 
converged  very  rapidly  to  the  actual  value.  The  2-D  LMS  algorithm  was  also  imple- 
mented for  this  case,  but  as  can  be  seen  in  Figure  10  the  convergence  rate  is  very 
slow. 

The  next  simulation  was  performed  using  an  ARMA  model  where  both 
the  AR  and  the  MA  masks  were  first  order  nonsymmetrical  half  plane  filters.  A 
computer-generated  white  Gaussian  noise  sequence  was  applied  both  to  a  filter  with 
known  coefficients  and  to  the  adaptive  filter.    The  error  between  the  output  of  the 
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Figure  9.   System  Identification  MA  Model  * 

two  filters  was  used  to  adjust  the  coefficients  of  the  adaptive  filter.  The  ARM  A  filter 
had  the  form 


y{nun2)     =  a{\,0)y{n1  -  l,n2)  +  a(-l,l)y(n!  +  l,n2  -  1) 

+  a(Otl)y(num  -  1)  +  a(l,l)y(n,  -  l,n2  -  1) 

+  6(1,0)^(71,  -  l,n2)  +  6(-l,l)u;(n1  +  l,n2  -  1) 

+  6(0,l)iw(n1,n2-  1)  +  6(1,  l)!^  -  l,n2-l) 


(256) 


with 


a(l,0)     =  0.8 

a(-l,l)     =  -0.1 

a(0,l)     =  0.4 
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Figure  10.  System  Identification  MA  Model  (2-D  LMS) 


a(l,l) 
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The  rate  of  convergence  is  shown  in  Figure  11  and  Figure  12  for  both  the  AR  and  the 
MA  coefficients.  As  can  be  seen  there,  each  of  the  coefficients  converged  in  about  80 
iterations  to  the  true  value. 

To  test  the  behavior  of  the  algorithm  with  non-stationary  data  the  same 
model  was  run  with  data  obtained  by  changing  some  of  the  coefficients  at  iteration 
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Figure  11.     System  Identification  ARMA  Model  (AR  coefF.     stationary 
data) 
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As  shown  in  Figure  13  and  Figure  14,  the  AR  and  the  MA  coefficients  converged  to 
the  true  initial  values,  but  at  iteration  120  when  some  of  the  coefficients  were  changed 
the  algorithm  started  slowly  tracking  the  new  coefficients.  Since  no  forgetting  factor 
was  used  line,  the  algorithm  does  not  forget  the  initial  data  and  the  convergence  is 
very  slow.    The  estimated  coefficients  remain  biased.      By  using  a  forgetting  factor 


56 


SYSTEM    IUENTIFICRTION 

fiRHR  MODEL        I  MR  COEFF 1CIENTS ) 
Bll,0)--0.2     B(-l,l)-0.1     BIO,  D-0.8     Bll,D—0.5 


LEGEND 
o  -  BI    1,01 
•c  -  BI-1,1  1 
a  -  BI    0,1 ) 
*  -  BI    1,1) 


10.0  50.0  60.0 

ITERATION 


70.0  60.0  90.0  100.0 


Figure  12.  System  Identification  ARMA  Model  (MA  coeff.  stationary 
data) 

of  A  =  0.95  we  obtained  the  results  shown  in  Figure  15  and  Figure  16.  With  the 
introduction  of  this  forgetting  factor  the  filter  coefficients  were  able  to  lock  on  the 
new  coefficients  in  about  150  more  iterations. 

2.      2-D  Parameter  Estimation 

The  first  computer  simulation  in  parameter  estimation  was  performed  us- 
ing a  first  order  nonsymmetric  half  plane  AR  model.  A  computer-  generated  white 
Gaussian  noise  sequence  was  applied  to  a  2-D  AR  filter  with  known  coefficients  to 
obtain  the  data.  The  adaptive  filter  has  access  only  to  the  AR  (output)  sequence  as 
shown  in  Figure  8.  The  error  between  the  output  of  the  two  filters  was  used  to  adjust 
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Figure  13.   System  Identification  ARMA  Model  (AR  coeff.  nonstationary 

data)  A  =  1.0 

the  coefficients  of  the  adaptive  filter.  The  AR  filter  had  the  form 

y(m,n2)    =    a(l, Q)y{n1  -  l,n2) +  a(-l,l)y(n1  +  l,n2  -  1)  (269) 

+     fl(0,l)y(»ii,?i2  -  1)  +  fl(l,l)y(ni  -  l,n2  -  l)  +  w{nun2) 


with 


a(l,0) 

=    0.8 

a(-l,l) 

=    -0.1 

o(0,l) 

=    0.4 

0(1,1) 

=    -0.5 

(270) 
(271) 
(272) 
(273) 
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Figure  14.  System  Identification  ARMA  Model  (MA  coefF.  nonstationary 
data)  A  =  1.0 

The  rate  of  convergence  is  shown  in  Figure  17  and  as  can  be  seen  each  of  the  coeffi- 
cients converged  to  values  close  to  the  true  values.  Since  the  algorithm  does  not  know 
the  input  sequence  and  has  to  estimate  it,  there  is  a  slight  variation  of  the  estimated 
coefficient  around  the  true  values.  Here  again  the  2-D  LMS  algorithm  was  imple- 
mented, but  as  can  be  seen  from  Figure  18  some  of  the  coefficients  did  not  converge 
even  after  900  iterations. 

Next  the  modeling  procedure  with  unknown  input  was  tested  for  the  ARMA 
case.  A  first  order  nonsymmetric  half  plane  mask  was  used  for  both  the  AR  and  MA 
coefficients.  A  computer-generated  white  Gaussian  noise  sequence  was  applied  to  a 
filter  with  known  coefficients  to  obtain  the  ARMA  data.  The  adaptive  filter  does  not 
have  access  to  the  driving  sequence.  The  error  between  the  output  of  the  two  filters 
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Figure  15.  System  Identification  ARM  A  Model  (AR  coefT.  nonstationary 
data)  A  =  0.95 

was  used  to  adjust  the  coefficients  of  the  adaptive  filter.   The  ARMA  filter  had  the 
form 

y(ni,n2)     =  o(l,0)y(n1-l1na)  +  a(-l,l)y(n1  +  l,na-l)  (274) 

+  a{0,\)y{nun2  -  1)  +  a(l,l)y(n1  -  l,n2  -  1) 

+  v)(nun2)  +  6(l,0)u>(n,  -  l,n2)  +  6(-l,l)to(n,  +  l,n2-l) 

+  fc(0,l)u'(?h,"2-  1)  +  6(l,l)t»(ni  -l,n2-  1) 


ith 


Willi 


a(l,0)     =    0.8 
a(-l,l)     =     -0.1 


(275) 
(276) 
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Figure  16.  System  Identification  ARMA  Model  (MA  coeff.  nonstationary 
data)  A  =  0.05 
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The  rate  of  convergence  is  shown  in  Figure  19  and  Figure  20  and  as  can  be  seen  each 
of  the  All  coefficients  converged  to  values  close  to  the  true  values.  This  is  similar  to 
what  happened  for  the  AR  parameter  estimation  problem.  The  MA  coefficients  also 
converged  to  values  acceptably  close  to  the  actual  MA  values,  but  they  show  some 
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Figure  17.  Parameter  Estimation  AR  Model 

bias.  As  in  the  previous  case  the  algorithm  does  not  have  knowledge  of  the  input 
sequence,  hence  estimates  it  using  the  bootstrapping  scheme.  Although  this  method  is 
difficult  to  analyze  due  to  its  non-linear  nature,  the  results  obtained  are  encouraging. 

3.      Image  Coding 

An  image  coding  problem  was  also  used  to  test  the  adaptive  algorithm. 
Two  black  and  white  images,  Figure  21  and  Figure  22,  with  256  x  256  pixels  were  the 
2-D  sequences  used  to  perform  AR  and  ARMA  parameter  estimation  as  described 
above.  The  2-D  LMS  algorithm  was  also  applied  to  the  images  to  estimate  the  AR 
parameters.  These  parameters  were  then  used  to  form  the  linear  predictor  used  in 
the  coding  and  decoding  scheme  shown  in  Figure  23.  A  two  level  quantizer  with  the 
step  size  value  taken  from  the  Max  table  was  used,  assuming  that  the  error  sequence 


62 


PRRRilETER  ESTIMRTION 

RR  MODEL  (2-D  IMS  ALGORITHM) 
Rll,0)-0.8  fll-1,1)— 0.1  fl(0,l)-0.4  011,11—  0.5 


LEGENO 
o  -  Rl  1 ,0) 
o  -  fll-1,1 
a  -  R I  0,1) 
♦  -  Rl  1,1 


0.0  100.0  200.0  300.0 


100.0  500.0 

ITERATION 


600.0        7oo.o        eca.o        eoo.o 


Figure  18.  Parameter  Estimation  AR  Model  (2-D  LMS) 


obtained  had  a  Gaussian  distribution  [Ref.  12].  This  resulted  in  a  quantized  error 
sequence  corresponding  to  one  bit  per  pixel.  The  image  reconstruction  was  performed 
by  driving  the  inverse  filter  with  the  quantized  error  sequence. 

The  first  image  was  reconstructed  after  being  encoded  using  the  three 
different  linear  predictors  and  the  results  are  shown  in  Figures  24,  25  and  26.  The 
error  images  between  the  original  image  and  the  reconstructed  images  are  shown  in 
Figures  27,  28  and  29.  One  of  the  most  widely  used  measures  for  the  performance 
of  a  predictive  coder  is  the  signal-to-noise  ratio  (SNR).  For  a  K  x  L  2-D  sequence 
y{n\-,n2)i  it  can  be  defined  as  follows: 


SNR{dB)  =  10  log 


ESr=oE£r=o*2(*i."') 


(283) 
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Figure  19.  Parameter  Estimation  ARMA  Model  (AR  coeff.) 

The  subjective  quality  of  the  reconstructed  images  agrees  with  the  signal- 
to-noise  ratio  (SNR)  obtained  for  each  case. 


2-D  LMS  (AR)        SNR    =    15.96  <f£ 

2-D  FRLS  (AR)       SNR    =    \1.2AdB 

2-D  FRLS  (ARMA)     SNR    =    18.29  dB 


(284) 
(285) 
(286) 


The  better  performance  of  the  2-D  FRLS  when  compared  with  the  2-D  LMS  is  ap- 
parent in  the  results.  The  improvements  obtained  for  this  image  with  the  ARMA 
model  imply  that  the  model  was  able  to  fit  the  image  better  than  the  AR  model  and, 
thus  produce  an  error  sequence  that  was  more  nearly  white. 

The  algorithm  was  also  tested  on  the  second  image  (Figure  22).  The  results 
are  shown  in  Figures  30,  31  and  32.  The  error  images  between  the  original  image 
and  the  reconstructed  images  are  shown  in  Figures  33,  34  and  35.      The  subjective 
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Figure  20.  Parameter  Estimation  ARMA  Model  (MA  coeff.) 

quality  of  the  reconstructed  images  once  again  agrees  with  the  signal-to-noise  ratio 
(SNR)  obtained  for  each  case,  but  the  quality  difference  between  the  reconstructed 
images  obtained  using  different  methods  is  smaller. 


2-D  LMS  (AR)        SNR    =    17.25  dB 

2-D  FRLS  (AR)       SNR    =    18.16  dB 

2-D  FRLS  (ARMA)     SNR    =    18.62  dB 


(287) 
(288) 
(289) 


In  particular,  the  improvements  obtained  with  the  ARMA  model  for  this  case  are 
not  as  large.  This  is  probably  because  the  second  image  has  a  large  number  of  sharp 
edges,  and  these  are  quite  difficult  to  model  with  any  finite  order  linear  model. 


65 


Figure  21.  Image  1  Original 


Figure  22.  Image  2  Original 
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Figure  23.  Predictive  coding,   (taken  from  [Ref.  G]) 


Figure  24.   linage  1  Reconstructed  2-D  LMS  (AR) 


07 


Figure  25.   Image  1  Reconstructed  2-D  FRLS  (AR) 


Figure  26.  Image  1  Reconstructed  2-D  FRLS  (ARMA) 
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Figure  27.  Image  1  Error  2-D  LMS  (AR) 


Figure  28.   Image  1  Error  2-D  FRLS  (AR) 
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Figure  29.  Image  1  Error  2-D  FRLS  (ARMA) 


Figure  30.  Image  2  Reconstructed  2-D  LMS  (AR) 
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Figure  31.  Image  2  Reconstructed  2-D  FRLS  (AR) 


Figure  32.  Image  2  Reconstructed  2-D  FRLS  (ARMA) 
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Figure  33.  Image  2  Error  2-D  LMS  (AR) 


Figure  34.  Image  2  Error  2-D  FRLS  (AR) 
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Figure  35.   Image  2  Error  2-D  FRLS  (ARMA) 
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V.  CONCLUSIONS  AND  RECOMMENDATIONS 

A  Two-Dimensional  Fast  Recursive  Least  Squares  (2-D  FRLS)  algorithm  was 
developed  using  a  geometric  formulation.  The  derivation  is  based  on  the  relation 
between  least  squares  prediction  and  the  concepts  of  orthogonality  associated  with 
vector  spaces.  The  ordering  necessary  to  develop  the  recursive  algorithm  was  imposed 
on  the  data  by  using  a  linear  scanning  index. 

A  substantial  reduction  in  computational  cost  is  obtained  when  compared  with 
the  basic  2-D  RLS  algorithm.  The  2-D  FRLS  algorithm  requires  on  the  order  of 
6R\I\2  arithmetic  operations  per  iteration  compared  with  1.5/\f  for  the  basic  RLS, 
where  I\\  is  the  number  of  channels  defined  for  the  2-D  FRLS  algorithm  and  K2  is 
the  total  number  of  coefficients  in  the  2-D  filter.  The  2-D  LMS  algorithm,  due  to 
its  simplicity,  is  still  more  economical  than  our  algorithm  in  terms  of  computational 
cost,  but  lacks  the  excellent  convergence  performance  experienced  for  the  2-D  FRLS. 

The  work  described  here  could  be  extended  in  several  different  ways.  First  a 
thorough  investigation  of  the  behavior  of  the  algorithm  when  using  finite  word  length 
implementation  as  well  as  different  forgetting  factors  could  be  developed.  Secondly, 
techniques  used  for  the  1-D  fast  RLS  to  obtain  further  reductions  of  the  computa- 
tional cost  could  be  investigated.  In  particular  a  variant  called  the  gain  normalized 
Fast  Transversal  Filter  [Ref.  6]  seems  to  be  applicable  to  the  2-D  FRLS.  Its  derivation 
however  does  not  follow  directly  from  the  geometrical  approach  presented  here.  Fi- 
nally, the  algorithm  could  be  tested  in  other  areas  including  2-D  parametric  spectral 
estimation. 
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APPENDIX    PROJECTION  MATRIX  UPDATE 


This  appendix  derives  the  results 


Pu.(«)  = 


and 


Kur(n)    - 


Pu(n-l)    Q 
QT  1 


Ku(n-l)        Q 
-uTKu(n-l)     1 


(290) 


(291) 


used  in  Chapter  3. 

Begin  by  noting  that  the  data  matrix  [U(n),  7r(n)]  can  be  partitioned  as 


[U(n),i(n)]  = 


0 
0 

U(n) 

0 

— 

1 

U(n-l)    0 
uT  1 


(292) 


where  u  is  the  last  row  of  U(??).  P^£(r?)  is  defined  by 

PvAn)  =  [U(n),7r(n)]  ([U(n),  7r(n)]T  [U(n),  Tr(n)])"1  [U(n),  7r(n)]T  (293) 


Then  using  (292)  we  have 

([U(n),7r(n)]T[U(77).7r(70] 


-l 


,  -l 


UT(n  -  l)U(n-  l)  +  uuT    u 
uT  1 


(294) 


Here  we  will  use  the  relation  for  the  inverse  of  a  symmetrical  matrix  by  partitioning 
[Ref.  13].   If: 

'  A     B 
C    D 


M  - 


(295) 
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with  C  =  B     and  A  and  D  square  matrices,  then 


M"1  = 


A^+PQ^P7    -PQ-1 

_Q-1PT  Q-l 


(296) 


with 


and 


P  =  A"'B 


Q  =  D-CP 


For  our  case  we  have 


A 
B 

C 
D 


=     UT(n-l)U(n-l)  +  uu: 


u 


=    u 


=     1 


(297) 
(298) 
(299) 
(300) 


The  matrix  inversion  lemma  [Ref.  5,  7]  is  used  first  to  obtain  A   1  as  follows. 
The  matrix  inversion  lemma  states  that  if 


A    =    E  +  FG-'F 


-lT?T 


(301 


then 


A-1     =    E_1-E_1F 


Ttti-1 


G  +  F'E-T 


i  -i 


Tp-1 


FJE 


(302) 


Now  taking 


E    = 
F    =    u 
G    =    1 


UT(n  -  l)U(n  -  1)] 


(303) 

(304) 
(305) 
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we  have 


-l 


=    E'-E^u 


Jr-i, 


=    E"1  - 


1  +  u7E_1u 

K  =  scalar  , 

1  +  A' 
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UyE 


(306) 
(307) 


where  A'  =  u   E  1u.  The  upper  left  partition  of  (296)  becomes 
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To  find  the  other  partitions  we  write 


P     =    A-,B  =  E"1u- 


-l. 


E^uA 
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thus  we  have 


-l 


-PQ-1     =     (UT(n-l)U(n-l))      u 


Now  substituting  (311),  (313)  and  (314)  in  (296)  we  obtain 
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Post  multiplying  this  result  by  the  transpose  of  (292)  we  obtain 


Kur(n)     = 


Ku(n-l)        0 

-UTKu(n-l)     1 

Finally,  premultiplying  by  (292),  (293)  becomes 


Pu»  = 


Pu(n-l)    Q 
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Q.E.D. 


(316) 


(317) 
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